2nd Call for Papers
The 1st Workshop on Counter Speech for Online Abuse:
A workshop for creating, investigating and improving tools for producing and evaluating counter speech.
Hate speech and abusive and toxic language are prevalent in online spaces. For example, a 2019 survey shows that in the UK 30-40% of people have experienced online abuse, and platforms like Facebook bring down millions of harmful posts every year, with the help of AI tools. While removal of such content can immediately reduce the quantity of harmful messages, it can bring about accusations of censorship and may not be effective at curbing hate in the long term. An alternative approach is to reply with counter speech, i.e. targeted responses aimed at refuting the hateful language using thoughtful and cogent reasons, and fact-bound arguments. This has been shown to be effective in influencing the behaviour of both the perpetrators of abuse and bystanders that witness the interactions, as well as providing support to victims.
The sheer amount of social media data shared online on a daily basis means that hate mitigation, using counter speech, requires reliable, efficient and scalable tools. Recently, efforts have been made to curate hate countering datasets and automate the production of counter speech. However, this research field is still in its infancy, and many questions remain open regarding the most effective approaches and methods to take, as well as how to evaluate them.
This first multidisciplinary workshop aims to bring together researchers from diverse backgrounds such as computer science and the social sciences, as well as policy makers and other stakeholders to attempt to understand how counter speech is currently used to tackle abuse by individuals, activists and organisations, how Natural Language Processing (NLP) and Generation (NLG) can be applied to produce counter narratives, and the implications of using large language models for this task. It will also address, but not be limited to, the questions of how to evaluate and measure the impacts of counter speech, the importance of expert knowledge from civil society in the development of counter speech datasets and taxonomies, and how to ensure fairness and mitigate the biases present in language models when generating counter speech.
Topics
We invite papers (long and short) on a wide range of topics, including but not limited to:
• Models and methods for generating counter speech;
• Dialogue agents employing counter speech to address hateful inputs, directed towards other people or the AI itself;
• Human and automatic evaluation methods of counter speech tools;
• Multidisciplinary studies including different perspectives on the topic such as from computer science, social science, NGOs and stakeholders;
• Development of datasets and taxonomy for counter speech;
• Potentials and limitations (e.g., fairness, biases) of using large language models for generating counter speech;
• Social impact and empirical studies of counter speech on social media, including investigating the effectiveness and consequences on users of employing counter speech to fight online hate;
• Proposals for future research on counter speech, and/or preliminary results of studies in this field
We accept three types of submissions:
* Regular research papers – long (8 pages) or short (4 pages);
* Non-archival submissions: like research papers, but will not be included in the proceedings;
* Research communications: 2-4 page abstracts summarising relevant research published elsewhere.
Submission link: https://softconf.com/n/cs4oa2023
Location: co-located with SIGdialxINLG, Prague, Czechia
Important dates
All deadlines are Anywhere on Earth (UTC-12)
* Submission deadline: Jun 26, 2023
* Notification of acceptance Jul 17, 2023
* Camera-ready deadline Aug 11, 2023
* Workshop date: September 11/12 2023
Format and Styling
Submissions should follow ACL Author Guidelines<https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines> and policies for submission, review and citation, and be anonymised for double blind reviewing. Please use ACL 2023 style files; LaTeX style files and Microsoft Word templates are available at https://2023.aclweb.org/calls/style_and_formatting/<https://2021.aclweb.org/downloads/acl-ijcnlp2021-templates.zip>.
Organising Committee:
* Yi-Ling Chung, The Alan Turing Institute
* Gavin Abercrombie, Heriot-Watt University
* Helena Bonaldi, Fondazione Bruno Kessler
* Marco Guerini, Fondazione Bruno Kessler
Contact
If you have any questions, please let us know at cs4oa(a)googlegroups.com
Website: https://sites.google.com/view/cs4oa
Twitter: @cs4oa_workshop<https://twitter.com/cs4oa_workshop>
________________________________
Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes:
1. Heriot-Watt University, a Scottish charity registered under number SC000278
2. Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.
Dear Sir/Ma'am,
I hope you are doing well and in good health. We are excited to announce a
call for a book chapter for an upcoming book titled "*Empowering
Low-Resource Languages With NLP Solutions.*"
Link: https://www.igi-global.com/publish/call-for-papers/call-details/6596
The objective of this book is to provide an in-depth understanding of
Natural Language Processing (NLP) techniques and applications specifically
tailored for low-resource languages. We believe that your valuable insights
and research in this domain would greatly enrich the content of this book.
To ensure a comprehensive and high-quality book, all submitted chapters
will undergo a rigorous peer-review process. The accepted book will be *indexed
in Scopus and Web of Science*, thereby enhancing the visibility and impact
of your work.
The book aims to cover a wide range of topics related to NLP in
low-resource languages. Some of the suggested topics, although not limited
to, include:
· Introduction to Low-Resource Languages in NLP
· Language Resource Acquisition for Low-Resource Languages
· Morphological Analysis and Morpho-Syntactic Processing
· Named Entity Recognition and Entity Linking for Low-Resource
Languages
· Part-of-Speech Tagging and Syntactic Parsing
· Machine Translation for Low-Resource Languages
· Sentiment Analysis and Opinion Mining for Low-Resource Languages
· Speech and Audio Processing for Low-Resource Languages
· Text Summarization and Information Retrieval for Low-Resource
Languages
· Multimodal NLP for Low-Resource Languages
· Code-switching and Language Identification for Low-Resource
Languages
· Evaluation and Benchmarking for NLP in Low-Resource Languages
· Applications of NLP in Low-Resource Language Settings
· Future Directions and Challenges in NLP
We encourage you to contribute a book chapter focusing on any of the
above-mentioned topics or related areas within the scope of NLP in
low-resource languages. The submission guidelines are as follows:
1. Please submit a chapter proposal (maximum 500 words) outlining the
objective, methodology, and expected outcomes of your proposed chapter by
July 3, 2023, to the submission portal:
https://www.igi-global.com/publish/call-for-papers/call-details/6596
2. Chapter proposals should include the title of the chapter,
author(s) name and their affiliations.
3. All submissions should be original and should not have been
previously published or currently under review elsewhere.
4. The chapters should be written in English and adhere to the
formatting guidelines provided after the acceptance of the proposal.
*Important Dates:*
July 3, 2023: Proposal Submission Deadline
July 17, 2023: Notification of Acceptance
September 17, 2023: Full Chapter Submission
October 31, 2023: Review Results Returned
December 12, 2023: Final Acceptance Notification
December 26, 2023: Final Chapter Submission
Thank you for considering this invitation, and we look forward to receiving
your valuable contribution to this book. If you have any further questions
or require additional information, please do not hesitate to contact us.
Best regards,
Editorial Team
Dr. Partha Pakray
National Institute of Technology Silchar
Email: partha(a)cse.nits.ac.in
Dr. Pankaj Dadure
University of Petroleum and Energy Studies Dehradun
Email: pankajk.dadure(a)ddn.upes.ac.in
Prof. Sivaji Bandyopadhyay
Jadavpur University, Kolkata
Email: sivaji.cse.ju(a)gmail.com
-----------------Apologies for cross-posting-------------------
Second Call for Papers
RANLP 2023 Student Research Workshop
4-6 September 2023
Varna, Bulgaria
https://sites.google.com/view/ranlp-stud-2023/
The International Conference RANLP 2023 (http://ranlp.org/) would like
to invite students at all levels (undergraduate, Master-, and
PhD-students) to present their ongoing or completed work at the Student
Research Workshop (https://sites.google.com/view/ranlp-stud-2023/).
SUBMISSIONS
We invite two types of student submissions:
Full Papers must describe original unpublished work of the student in
any topic area of the workshop. Full papers are limited to 8 pages for
content, with 2 additional pages for references.
Short Papers may describe either work in progress or a research
proposal. They may also be in the style of a position paper that surveys
and criticizes existing literature. Short papers must include clear
directions for future research. Submissions of this type are limited to
6 pages for content, with 2 additional pages for references.
All papers must be submitted in .pdf format through the START system
(https://softconf.com/ranlp23/ranlp20t23stud/) . The papers should
follow the format of the main conference, described at the RANLP website
(http://ranlp.org/), Submissions page.
All papers must have only student authors. Submissions with non-student
authors will not be considered for review. After eventual acceptance of
the paper, the authors could add their supervisor(s) in the
Acknowledgments Section. The submissions must specify the student’s
level (Bachelor-, Master-, or PhD) and the type of submission (Full or
Short).
Double submission Authors may submit the same paper at several
conferences. In this case, they must notify the organizers by filling in
the corresponding information in the submission form, as well as
notifying the contact organizer by email.
TOPICS OF INTEREST
The aim of this workshop is to facilitate the exchange of knowledge
between young researchers by providing an excellent opportunity to
present and discuss their work and to receive mentorship and valuable
feedback from an international research community. The research to be
presented can come from any topic within Natural Language Processing
(NLP) and Computational Linguistics, including but not limited to the
following:
Computational Social Science and Social Media;
Computer-aided Language Learning;
Dialogue and Interactive Systems;
Discourse and Pragmatics;
Ethics and NLP;
Information Extraction;
Information Retrieval and Text Mining;
Intent Recognition and Detection;
Interpretability and Analysis of Models for NLP;
Language and Vision;
Language Generation;
Language Resources and Corpora;
Linguistic Theories;
Machine Translation and Computer-aided Translation Tools;
Multilingual NLP;
Multimodal Systems;
NLP Applications – Biomedical, Educational, Healthcare, Financial,
Legal, Semantic Web, etc.;
Opinion Mining and Sentiment Analysis;
Phonetics, Phonology, and Morphology;
Question Answering;
Semantics;
Stylistic Analysis;
Sublanguages and Controlled languages;
Syntax: Tagging, Chunking, and Parsing;
Temporal Processing;
Text Categorization;
Text Simplification and Readability Estimation;
Text Summarisation;
Text-to-Speech Synthesis and Speech Recognition;
Textual Entailment.
All accepted papers will be presented at the Student Workshop sessions
(oral or poster) during the main conference days: 4-6 September 2023.
The articles will be issued in a special Student Session proceedings and
uploaded to the ACL Anthology.
IMPORTANT DATES
Submission deadline: 3 July 2023
Acceptance notification: 4 August 2023
Camera-ready deadline: 20 August 2023
Workshop: 4 - 6 September 2023
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth")
ORGANISERS
Momchil Hardalov (AWS AI Labs, Spain)
Zara Kancheva (Institute of Information and Communication Technologies,
Bulgarian Academy of Sciences, Bulgaria)
Boris Velichkov (Faculty of Mathematics and Informatics at Sofia
University “St. Kliment Ohridski”, Bulgaria)
Ivelina Nikolova-Koleva (Institute of Information and Communication
Technologies, Bulgarian Academy of Sciences, and Sirma AI, Bulgaria)
Milena Slavcheva (Institute of Information and Communication
Technologies, Bulgarian Academy of Sciences, Bulgaria)
Faculty of Science and Engineering Dean Research PhD Studentship
LASER - Large Language Models for Academic SEarch and Recommendation
Deadline: June 19, 2023
University of Wolverhampton, UK
Applications are invited for doctoral study in Computer Science, Information Retrieval and Natural Language Processing on the topic of Large Language Models for Academic Search and Recommendation.
Project Description
Scientific publications are an important vehicle for understanding the world around us; they contain scientific evidence that informs researchers and decision-makers, with a high impact on society. However, the rapid and large number of publications, in particular on preprint servers, causes an information overload for everybody struggling to keep up with developments in their field. This makes finding relevant information of high quality a challenging task, which requires advanced scholarly search and recommendation solutions. Recent developments in Large Language Models (LLMs) are having a huge impact on Artificial Intelligence (AI) and related fields. LLMs are a type of AI trained on huge amounts of text, with ChatGPT/GPT-4 and Bard as popular examples. LLMs combined with conversational AI provide exciting new possibilities for interactive search and recommendation, but they are also suffering from severe flaws. While there are efforts to combine LLMs with, e.g., neural search, the endeavour of utilising LLMs to tackle information overload in academia has only started and more research is needed.
This PhD studentship will explore how LLMs can be used to improve academic search and recommendation and what their benefits and limitations are. This may include integrating LLMs into search and recommendation services or utilising search to keep LLMs from "hallucinating". A further part of this project is to estimate the quality of publications.
The PhD project provides exciting opportunities for the successful candidate to work with and critically reflect on innovative technologies at the forefront of AI that will shape our digital future. As a further incentive, the PhD candidate will be able to participate in an EU Horizon Europe Staff Exchange project, providing the opportunity to go on fully funded secondments to collaborate with an international network of researchers and industry partners.
For further information regarding the project or an informal discussion please contact Director of Studies, Dr Ingo Frommholz <i.frommholz(a)wlv.ac.uk>.
To apply for one of the above PhD Research Studentship applicants must hold a first class/distinction at Master and/or Bachelor level of study.
Applications to include one identified project, a full CV (including 2 referee names and contact details), transcripts and a letter of application outlining the motivation for applying (maximum of 2 pages). Applicants from outside UK must provide evidence of English Language requirement as stated in https://www.wlv.ac.uk/research/research-degrees/
Application submission deadline is 10:00am BST 19 June 2023 to FSEPGR(a)wlv.ac.uk
A shortlist of candidates will be prepared from the pool of applicants, in line with Faculty of Science and Engineering Post Graduate Research (PGR) studentship selection criteria, who will be invited to attend an interview with a panel of academic staff, week commencing 26 June 2023.
Following this process, all successful candidates will be notified to enrol in July 2023 on a PhD degree programme. The studentship award will include tuition fees at home level for the first three years of full-time study including any write-up period fees and research support fees.
For further information on fees https://www.wlv.ac.uk/apply/funding-costs-fees-and-support/fees-and-costs/r…
Informal enquiries are welcome and should be directed to the individual Director of Studies mentioned above.
Further information: https://www.wlv.ac.uk/schools-and-institutes/faculty-of-science-and-enginee… (look for the LASER project)
--
Ingo Frommholz, PhD, FBCS, FHEA
Reader (~Associate Professor) in Data Science
Deputy Head Digital Innovations and Solutions Centre (DISC)
University of Wolverhampton, UK
Adjunct Professor, Bern University of Applied Sciences, Switzerland
Web: http://www.frommholz.org/ | Email: ifrommholz(a)acm.org
Twitter: @iFromm | Mastodon: @ingo@idf.social
PGP/GPG fingerprint: B74E A422 C7B2 A5BB 2BC2 523B 2790 216E F8F8 D166
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x2790216EF8F8D166
Dear colleagues,
We invite submissions of papers and talk proposals to
LongEval 2023 Workshop on Longitudinal Evaluation of Model Performance.
https://clef-longeval.github.io/
CLEF 2023 Conference and Labs of the Evaluation Forum<https://clef2023.clef-initiative.eu/index.php>
18-21 September 2023, Thessaloniki - Greece<https://clef2023.clef-initiative.eu/index.php><https://clef2023.clef-initiative.eu/index.php>
Topics of interest include (but not limited to):
• Evaluations of the temporal persistence of information retrieval (IR) systems and text classifiers for various tasks
• Challenges posed by the dynamic nature of language
• Time-aware longitudinal models
• Post-evaluation stage LongEval shared-task submissions.
Deadlines:
Papers: June 5th (to be included in the proceedings)
Submission format: https://drive.google.com/drive/folders/1r2lNOteMNoQrhQGUat6VHPUnwFAgR8Nz
Length: a maximum of 8 pages not including references and appendices).
Submission link: https://easychair.org/my/conference?conf=clef2023
Talk proposals: July 10th (not included in the proceedings)
Submission format: a maximum of 2 pages
Submission link: https://forms.gle/fU46Fb5zJufxF5NV8
Organisers: Alkhalifa, Rabab, Bilal, Iman, Borkakoty, Hsuvas, Camacho-Collados, Jose, Deveaud, Romain, El-Ebshihy, Alaa, Espinosa-Anke, Luis, Gonzalez-Saez, Gabriela, Galusakova, Petra, Goeuriot, Lorraine, Kochkina, Elena, Liakata, Maria, Loureiro, Daniel, Tayyar Madabushi, Harish, Mulhem, Philippe, Piroi, Florina, Popel, Martin, Servan, Christophe, Zubiaga, Arkaitz.
Feel free to reach out with any questions!
Best regards,
Elena Kochkina
On behalf of LongEval organisers
The 1st Workshop on Computational Terminology in NLP and Translation
Studies (ConTeNTs)
Varna, 7th-8th September, 2023
In conjunction with RANLP 2023 - International Conference "Recent
Advances in Natural Language Processing"
Third call for papers
Computational Terminology and new technologies applied to translation
studies have attracted the interest of researchers with very different
multidisciplinary backgrounds and motivations. Those fields cover a
range of areas in Natural Language Processing (NLP) such as information
retrieval, terminology extraction, question-answering systems, ontology
building, machine translation, computer-aided translation, automatic or
semi-automatic abstracting, text generation, etc.
Terminological identification, extraction and coinage of new terms are
essential for knowledge mining from texts, both in high and low
resources languages. Quick evolutions and new developments in
specialised domains require efficient and systematic automatic term
management. New terms need to be coined and translated to ensure the
equitable development of domains in all languages.
During the last decade, deep learning and neural methods have become the
state of the art for most NLP applications. Those applications were
shown to outperform previous methods on various tasks, including
automatic term extraction, language mining, assessment of quality in
machine translation, accessibility of terminology, etc. On the one hand,
NLP and computational linguistics try to improve the work of translators
and interpreters by developing Computer-Assisted Translation (CAT)
tools, Translation Memories (TMs), terminological databases and
terminology extraction tools, etc. On the other hand, the NLP field
still needs the efforts and knowledge of translators, interpreters and
linguists to provide better services and tools based on the real
necessities of those language professionals.
The aim of this workshop is to promote new insights into the ongoing and
forthcoming developments in computational terminology by bringing
together NLP experts, as well as terminologists and translators. By
uniting researchers with such diverse profiles, we hope to bridge some
of the gaps between these disciplines and inspire a dialogue between
various parties, thus paving the way to more artificial intelligence
applications based on mutual collaboration between language and
technology.
Topics of Interest
The ConTeNTs workshop invites the submission of papers reporting on
original and unpublished research on topics related to Computational
Terminology in NLP and Translation Studies, including but not limited
to:
* Automatic term extraction: monolingual and multilingual extraction
of terms from parallel and comparable corpora, including single and
multiword expressions;
* Extraction and acquisition of semantic relations between terms;
* Extraction and generation of domain specific definitions and
disambiguation of terms;
* Representation of terms, management of term variation and the
discovery of synonym terms or term clusters and its relation to NLP
applications;
* Extraction of terminological context, through the use of comparable
and parallel corpus;
* Accessibility of terminology in certain domains, relevant to
non-experts or to laypersons, and its relevance to NLP applications such
as, chatbots, automatic email generation or spoken language interface;
* The impact of terminology on MT (applying terminology constraints,
evaluation of MT in domain-specific settings, etc.);
* The creation of domain ontologies, thesaurus, terminological
resources in specialised domains;
* The use of new technologies in translation studies and research and
the use of terminological resources in specialised translation;
* Identification of key problems in terminology and new technologies
used in translation studies;
* Evaluation of terminological resources in various NLP applications
and the impact of these resources have on the performance of the
automatic systems;
* Emerging language technologies: how the increased reliance on
real-time language technologies would change the structure of language;
* Corpus based studies applied to translation and interpreting: the
use of parallel and comparable corpora for translating phraseological
units;
* Phraseology and multiword expressions in cross-linguistic studies;
* Translation and interpreting tools, such as translation memories,
machine translation and alignment tools;
* User requirements for interpreting and translation tools.
SUBMISSION GUIDELINES
Submissions must consist of full-text papers and should not exceed 7
pages excluding references, they should be a minimum of 5 pages long.
The accepted papers will be published as ConTeNTs workshop e-proceedings
with ISBN, will be assigned a DOI and will be also available at the time
of the conference. The papers should be in English.
Authors of accepted papers will receive guidelines regarding how to
produce camera-ready versions of their papers for inclusion in the
proceedings.
Each submission will be reviewed by at least two programme committee
members. Accepted papers will be presented orally as part of the
programme of the workshop.
Submissions
Link to START system: https://softconf.com/ranlp23/ConTeNTS
Website of the workshop: https://contents2023.kulak.kuleuven.be/
Should you require any assistance with the submission, please do not
hesitate to contact us at amalhaddad(a)ugr.es and
ayla.rigoutsterryn(a)kuleuven.be.
Important Dates
Deadline for paper submission: 10 July 2023
Acceptance notification: 5 August 2023
Final camera-ready version: 25 August 2023
Workshop camera-ready proceedings ready: 31 August 2023
ConTeNTs workshop: 7/8 September 2023
Workshop Chairs & Organising Committee
Ayla Rigouts Terryn, Katholieke Universiteit Leuven, Belgium
Amal Haddad Haddad, Universidad de Granada, Spain
Ruslan Mitkov, University of Wolverhampton, United Kingdom
Programme Committee
* Sophia Ananiadou (University of Manchester)
* Maria Andreeva Todorova (Bulgarian Academy of Sciences)
* Silvia Bernardini (University of Bologna)
* Melania Cabezas García (Universidad de Granada)
* Rute Costa (Universidade Nova de Lisboa)
* Esther Castillo Pérez (Universidad de Granada)
* Patrick Drouin (Université de Montréal)
* Pamela Faber (Universidad de Granada)
* Mercedes García de Quesada (Universidad de Granada)
* Dagmar Gromann (Centre for Translation Studies - University of
Vienna)
* Tran Thi Hong Hanh (L3i Laboratory, University of La Rochelle)
* Rejwanul Haque (National College of Ireland)
* Amir Hazem (Nantes University)
* Kyo Kageura (University of Tokyo)
* Barbara Karsch (BIK Terminology - USA)
* Dorothy Kenny (Dublin City University)
* Miloš Jakubíček (Sketch Engine)
* Hendrik Kockaert (KU Leuven)
* Philipp Koehn (Johns Hopkins University)
* Maria Kunilovskaya (Saarland University)
* Marie-Claude L'Homme (Université de Montréal)
* Hélène Ledouble (Université de Toulon)
* Pilar León-Araúz (Universidad de Granada)
* Rodolfo Maslias (former Head of TermCoord, European Parliament)
* Silvia Montero Martínez (Universidad de Granada)
* Emmanuel Morin (LS2N-TALN)
* Rogelio Nazar (Pontificia Universidad Católica de Valparaíso)
* Sandrine Peraldi (University College Dublin)
* Silvia Piccini (Italian National Research Council)
* Thierry Poibeau (CNRS)
* Senja Pollak (Jožef Stefan Institute)
* Maria Pozzi Pardo (El Colegio de México)
* Tharindu Ranasinghe (Aston University)
* Arianne Reimerink (Universidad de Granada)
* Andres Repar (Jožef Stefan Institute)
* Christophe Roche (Université Savoie Mont-Blanc)
* Antonio San Martín Pizarro (Université du Québec à Trois-Rivières)
* Beatriz Sánchez Cárdenas (Universidad de Granada)
* Vilelmini Sosoni (Ionian University)
* Irena Spasic (Cardiff University)
* Elena Isabelle Tamba (Romanian Academy, Iași Branch)
* Rita Temmerman (Vrije Universiteit Brussel)
* Jorge Vivaldi Palatresi (Universitat Pompeu Fabra)
PhD in ML/NLP – Efficient, Fair, robust and knowledge informed
self-supervised learning for speech processing
Starting date: November 1st, 2022 (flexible)
Application deadline: September 5th, 2022
Interviews (tentative): September 19th, 2022
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, natural language processing,
self-supervised learning, knowledge informed learning, Robustness, fairness
*CONTEXT*
The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive
and Innovative Speech Technologies) will start on November 1st 2022.
Self-supervised learning (SSL) has recently emerged as one of the most
promising artificial intelligence (AI) methods as it becomes now
feasible to take advantage of the colossal amounts of existing unlabeled
data to significantly improve the performances of various speech
processing tasks.
*PROJECT OBJECTIVES*
Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shown an
impressive impact on downstream tasks performance. This is mainly due to
their ability to benefit from a large amount of data at the cost of a
tremendous carbon footprint rather than improving the efficiency of the
learning. Another question related to SSL models is their unpredictable
results once applied to realistic scenarios which exhibit their lack of
robustness. Furthermore, as for any pre-trained models applied in
society, it isimportant to be able to measure the bias of such models
since they can augment social unfairness.
The goals of this PhD position are threefold:
- to design new evaluation metrics for SSL of speech models ;
- to develop knowledge-driven SSL algorithms ;
- to propose methods for learning robust and unbiased representations.
SSL models are evaluated with downstream task-dependent metrics e.g.,
word error rate for speech recognition. This couple the evaluation of
the universality of SSL representations to a potentially biased and
costly fine-tuning that also hides the efficiencyinformation related to
the pre-training cost. In practice, we will seek to measure the training
efficiency as the ratio between the amount of data, computation and
memory needed to observe a certain gain in terms of performance on a
metric of interest i.e.,downstream dependent or not. The first step will
be to document standard markers that can be used as robust measurements
to assess these values robustly at training time. Potential candidates
are, for instance, floating point operations for computational
intensity, number of neural parameters coupled with precision for
storage, online measurement of memory consumption for training and
cumulative input sequence length for data.
Most state-of-the-art SSL models for speech rely onmasked prediction
e.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Such
prevalence in the literature is mostly linked to the size, amount of
data and computational resources injected by thecompany producing these
models. In fact, vanilla masking approaches and contrastive losses may
be identified as uninformed solutions as they do not benefit from
in-domain expertise. For instance, it has been demonstrated that blindly
masking frames in theinput signal i.e. HuBERT and WavLM results in much
worse downstream performance than applying unsupervised phonetic
boundaries [Yue2021] to generate informed masks. Recently some studies
have demonstrated the superiority of an informed multitask learning
strategy carefully selecting self-supervised pretext-tasks with respect
to a set of downstream tasks, over the vanilla wav2vec 2.0 contrastive
learning loss [Zaiem2022]. In this PhD project, our objective is: 1.
continue to develop knowledge-driven SSL algorithms reaching higher
efficiency ratios and results at the convergence, data consumption and
downstream performance levels; and 2. scale these novel approaches to a
point enabling the comparison with current state-of-the-art systems and
therefore motivating a paradigm change in SSL for the wider speech
community.
Despite remarkable performance on academic benchmarks, SSL powered
technologies e.g. speech and speaker recognition, speech synthesis and
many others may exhibit highly unpredictable results once applied to
realistic scenarios. This can translate into a global accuracy drop due
to a lack of robustness to adversarial acoustic conditions, or biased
and discriminatory behaviors with respect to different pools of end
users. Documenting and facilitating the control of such aspects prior to
the deployment of SSL models into the real-life is necessary for the
industrial market. To evaluate such aspects, within the project, we will
create novel robustness regularization and debasing techniques along two
axes: 1. debasing and regularizing speech representations at the SSL
level; 2. debasing and regularizing downstream-adapted models (e.g.
using a pre-trained model).
To ensure the creation of fair and robust SSL pre-trained models, we
propose to act both at the optimization and data levels following some
of our previous work on adversarial protected attribute disentanglement
and the NLP literature on data sampling and augmentation [Noé2021].
Here, we wish to extend this technique to more complex SSL architectures
and more realistic conditions by increasing the disentanglement
complexity i.e. the sex attribute studied in [Noé2021] is particularly
discriminatory. Then, and to benefit from the expert knowledge induced
by the scope of the task of interest, we will build on a recent
introduction of task-dependent counterfactual equal odds criteria
[Sari2021] to minimize the downstream performance gap observed in
between different individuals of certain protected attributes and to
maximize the overall accuracy. Following this multi-objective
optimization scheme, we will then inject further identified constraints
as inspired by previous NLP work [Zhao2017]. Intuitively, constraints
are injected so the predictions are calibrated towards a desired
distribution i.e. unbiased.
*SKILLS*
*
Master 2 in Natural Language Processing, Speech Processing, computer
science or data science.
*
Good mastering of Python programming and deep learning framework.
*
Previous in Self-Supervised Learning, acoustic modeling or ASR would
be a plus
*
Very good communication skills in English
*
Good command of French would be a plus but is not mandatory
*SCIENTIFIC ENVIRONMENT*
The thesis will be conducted within the Getalp teams of the LIG
laboratory (_https://lig-getalp.imag.fr/_ <https://lig-getalp.imag.fr/>)
and the LIA laboratory (https://lia.univ-avignon.fr/). The GETALP team
and the LIA have a strong expertise and track record in Natural Language
Processing and speech processing. The recruited person will be welcomed
within the teams which offer a stimulating, multinational and pleasant
working environment.
The means to carry out the PhD will be providedboth in terms of missions
in France and abroad and in terms of equipment. The candidate will have
access to the cluster of GPUs of both the LIG and LIA. Furthermore,
access to the National supercomputer Jean-Zay will enable to run large
scale experiments.
The PhD position will be co-supervised by Mickael Rouvier (LIA, Avignon)
and Benjamin Lecouteux and François Portet (Université Grenoble Alpes).
Joint meetings are planned on a regular basis and the student is
expected to spend time in both places. Moreover, the PhD student will
collaborate with several team members involved in the project in
particular the two other PhD candidates who will be recruited and the
partners from LIA, LIG and Dauphine Université PSL, Paris. Furthermore,
the project will involve one of the founders of SpeechBrain, Titouan
Parcollet with whom the candidate will interact closely.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + master
notes + be ready to provide letter(s) of recommendation; and be
addressed to Mickael Rouvier (_mickael.rouvier(a)univ-avignon.fr_
<mailto:mickael.rouvier@univ-avignon.fr>), Benjamin
Lecouteux(benjamin.lecouteux(a)univ-grenoble-alpes.fr) and François Portet
(_francois.Portet(a)imag.fr_ <mailto:francois.Portet@imag.fr>). We
celebrate diversity and are committed to creating an inclusive
environment for all employees.
*REFERENCES:*
[Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T.,
Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of Speaker
Representation for Attribute-Driven Privacy Preservation in Proc.
Interspeech 2021 (2021), 1902–1906.
[Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D. Counterfactually
Fair Automatic Speech Recognition. IEEE/ACM Transactions on Audio,
Speech, and Language Processing 29, 3515–3525 (2021)
[Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-Supervised Speech
Representation Learning in Proc. Interspeech 2021 (2021), 746–750.
[Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext Tasks Selection
for Multitask Self-Supervised Speech Representation in AAAI, The 2nd
Workshop on Self-supervised Learning for Audio and Speech Processing,
2023 (2022).
[Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. - W.
Men Also Like Shopping: Reducing Gender Bias Amplification using
Corpus-level Constraints in Proceedings of the 2017 Conference on
Empirical Methods in Natural Language Processing (2017), 2979–2989.
--
François PORTET
Professeur - Univ Grenoble Alpes
Laboratoire d'Informatique de Grenoble - Équipe GETALP
Bâtiment IMAG - Office 333
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
FRANCE
Phone: +33 (0)4 57 42 15 44
Email:francois.portet@imag.fr
www:http://membres-liglab.imag.fr/portet/
Call for Papers: 3rd Workshop on Computational Linguistics for the Political and Social Sciences (CPSS 2023): https://sites.google.com/view/cpss2023konvens/home-page
* Workshop description *
This workshop aims at bringing together researchers and ideas from computational linguistics/NLP and the text-as-data community from political and social science to foster collaboration and catalyze further interdisciplinary research efforts between these communities.
* Potential topics *
- Modeling political communication with NLP (e.g. topic classification, position measurement)
- Mining policy debates from heterogeneous textual sources
- Modeling complex social constructs (e.g. populism, polarization, identity) with NLP methods
- Political and social bias in language models
- Methodological insights in interdisciplinary collaboration: workflows, challenges, best practices
- Application of NLP methods to understand and support democratic decision making
- Resources and tools for Political/Social Science research
- … and more
* Important dates *
- Submission deadline: June 14, 2023
- Notification of acceptance: July 10, 2023
- Camera-ready deadline: July 20, 2023
- Workshop: September 22, 2023
The workshop is co-located with KONVENS 2023 in Ingolstadt (https://www.thi.de/konvens-2023).
* Submissions *
We solicit two types of submissions:
- archival papers describing original and unpublished work (long papers: max. 8 pages, references/appendix excluded; short papers: max 4 pages, references/appendix excluded). Accepted papers will be published in the ACL anthology. For the submission format, refer to the KONVENS template.
- non-archival papers (1-page abstracts, references excluded) describing already published research or ongoing work
The two formats will meet the need of researchers from different communities, allowing the exchange of ideas in a "get to know each other" environment which we hope will foster future collaborations.
For more information, please refer to the workshop website: https://sites.google.com/view/cpss2023konvens/home-page
If you have any questions, please feel free to contact the workshop organizers.
* Organizers *
Gabriella Lapesa (U-Stuttgart)
Christopher Klamm (U-Mannheim)
Theresa Gessler (European University Viadrina)
Valentin Gold (U-Göttingen)
Simone Ponzetto (U-Mannheim)
**** We apologize for the multiple copies of this email. In case you are
already registered to the next webinar, you do not need to register
again. ****
Dear colleague,
We are happy to announce the next webinar in the Language Technology
webinar series organized by the HiTZ research center (Basque Center for
Language Technology, http://hitz.eus). This will be the final webinar of
this academic year. You can check the videos of previous webinars and
the schedule for upcoming webinars here: http://www.hitz.eus/webinars
Next webinar:
* *Speaker*: Pascale Fung (The Hong Kong University of Science and
Technology)
* *Title*: Safer Generative ConvAI
* *Date*: Jun 1, 2023, 15:00 CET
* *Summary*: Generative models for Conversational AI are less than a
decade old, but they hold great promise for human-machine
interactions. Machine responses based on generative models can seem
quite fluent and human-like, empathetic and funny, knowledgeable and
professional. However, behind the confident voice of generative
ConvAI systems, they can also be hallucinating misinformation,
giving biased and harmful views, and are still not "safe" enough for
many real life applications. The expressive power of generative
ConvAI models and their undesirable behavior are two sides of the
same coin. How can we harness the fluency, diversity, engagingness
of generative ConvAI models while mitigating the downside? In this
talk, I will present some of our team’s recent work in making
generative ConvAI safer via mitigating hallucinations,
misinformation, and toxicity.
* *Bio*: Pascale Fung is a Chair Professor at the Department of
Electronic & Computer Engineering at The Hong Kong University of
Science & Technology (HKUST), and a visiting professor at the
Central Academy of Fine Arts in Beijing. She is an elected Fellow of
the Association for the Advancement of Artificial Intelligence
(AAAI) for her "significant contributions to the field of
conversational AI and to the development of ethical AI principles
and algorithms", an elected Fellow of the Association for
Computational Linguistics (ACL) for her “significant contributions
towards statistical NLP, comparable corpora, and building
intelligent systems that can understand and empathize with humans”.
She is a Fellow of the Institute of Electrical and Electronic
Engineers (IEEE) for her “contributions to human-machine
interactions” and an elected Fellow of the International Speech
Communication Association for “fundamental contributions to the
interdisciplinary area of spoken language human-machine
interactions”. She is the Director of HKUST Centre for AI Research
(CAiRE). She was the founding chair of the Women Faculty Association
at HKUST. She is an expert on the Global Future Council, a think
tank for the World Economic Forum. She represents HKUST on
Partnership on AI to Benefit People and Society. She is on the Board
of Governors of the IEEE Signal Processing Society. She is a member
of the IEEE Working Group to develop an IEEE standard - Recommended
Practice for Organizational Governance of Artificial Intelligence.
Her research team has won several best and outstanding paper awards
at ACL, ACL and NeurIPS workshops.
Check past and upcoming webinars at the following url:
http://www.hitz.eus/webinars If you are interested in participating,
please complete this registration form:
http://www.hitz.eus/webinar_izenematea
If you cannot attend this seminar, but you want to be informed of the
following HiTZ webinars, please complete this registration form instead:
http://www.hitz.eus/webinar_info
Best wishes,
HiTZ Zentroa