18th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA
WITH SHARED TASK ON MULTILINGUAL TERMINOLOGY EXTRACTION
FROM COMPARABLE CORPORA
Co-located with COLING 2025 (Abu Dhabi)
Paper submission deadline: 30 November, 2024
Workshop website: https://comparable.lisn.upsaclay.fr/bucc2025/
COLING website: https://coling2025.org/
Keynote speaker: Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi
**************************************************************
* Motivation
In the language engineering and linguistics communities, research in
comparable corpora has been motivated by two main reasons. In language
engineering, on the one hand, it is chiefly motivated by the need to
use comparable corpora as training data for statistical NLP
applications such as statistical and neural machine translation or
cross-lingual retrieval. In linguistics, on the other hand, comparable
corpora are of interest because they enable cross-language discoveries
and comparisons. It is generally accepted in both communities that
comparable corpora consist of documents that are comparable in content
and form in various degrees and dimensions across several
languages. Parallel corpora are on the one end of this spectrum, and
unrelated corpora are on the other.
In recent years, the use of comparable corpora for pre-training Large
Language Models (LLMs) has led to their impressive multilingual and
cross-lingual abilities, which are relevant to a range of applications,
including Information Retrieval, Machine Translation, Cross-lingual text
classification, etc. The linguistic definitions and observations related
to comparable corpora can improve methods to mine such corpora or
to improve cross-lingual transfer of LLMs. Therefore, it is of great interest
to bring together builders and users of such corpora.
* Shared Task
This year we will run a shared task aimed at detecting translations of
terms via comparable corpora. Please see the website for details: https://comparable.limsi.fr/bucc2025/bucc2025-task.html
* Topics
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and
pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide
for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words,
multi-word expressions, proper names, named entities, sentences, and
paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from
comparable corpora
- Induction of multilingual word classes from comparable corpora
Comparable Corpora in the Humanities:
- Comparing linguistic phenomena across languages in contrastive
linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic
linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research
* Workshop Organizers
- Serge Sharoff (University of Leeds)
- Ayla Rigouts Terryn (Université de Montréal (UdeM), Mila)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay)
- Reinhard Rapp (University of Mainz, Germany)
* Program Committee
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences,
Iran)
- Eleftherios Avramidis (DFKI, Germany)
- Gabriel Bernier-Colborne (National Research Council, Canada)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Alex Fraser (University of Munich, Germany)
- Natalia Grabar (University of Lille, France)
- Amal Haddad Haddad (Universidad de Granada, Spain)
- Amir Hazem (University of Tokyo, Japan)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan).
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (RWS, USA)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Nasredine Semmar (CEA LIST, Paris, France)
- Silvia Severini (Leonardo Labs, Italy)
- Pranaydeep Singh (University of Gent, Belgium)
- Richard Sproat (Google, USA)
- Marko Tadić (University of Zagreb, Croatia)
- François Yvon (Sorbonne Université, France)
We are recruiting PhD researchers for the UKRI/RAi UK Keystone project
AdSoLve on Addressing Sociotechnical Limitations of LLMs:
https://adsolve.github.io/
Up to four funded positions are available in a joint collaboration between
Queen Mary University of London (QMUL) and the Imperial College London CDT
in healthcare AI - a great opportunity to work with leading academics in
NLP, AI, healthcare, and responsible AI. AdSoLve offers collaborations
across 4 universities, a large consortium and a network of over 21
non-academic partners. QMUL has one of the UK's leading NLP research
groups, with 8 core faculty and a group of c.40 researchers.
APPLICATION DEADLINE 28th July 2024
Interviews 5th & 6th September 2024
For details see:
https://www.findaphd.com/phds/programme/phd-opportunities-in-addressing-soc…https://adsolve.github.io/assets/other/phd_advert_QMUL.pdf
--
Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/
Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/
Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/
School of Electronic Engineering and Computer Science
Queen Mary University of London, London E1 4NS, UK
*My working days for QMUL are **Tuesday-Thursday**; responses to mail on
other days may be delayed.*
Dear Corpora-list,
We are advertising a post-doctoral position in ML/XAI : 18 month at IMT
Mines Alès (south of France), or IMT Business School, Evry (near Paris)
Subject: Evaluation of the impact of XAI techniques on Human-Machine
collaboration
Context: ENFIELD project, Horizon-funded European AI Network of
Excellence on adaptive, sustainable, human-centered and trustworthy AI.
Objectives :
Evaluate the impact of XAI methods on Human-Machine collaboration
through the study of :
Performance of the human operator in performing a task, in different
contexts: alone, with the help of a predictive model for which decisions
will be explained/not explained, with the help of an XAI technique,
Types of human-machine collaboration (e.g. delegation, substitution,
mediation), Potential biases induced by XAI techniques.
A focus will be made on specific contexts of study (e.g., image
classification or NLP tasks, XAI techniques based on local
interpretability using attribution methods).
You will contribute to:
Defining the study contexts (e.g. games, image classification) and test
protocols to be considered.
Selecting and implementing predictive models and XAI techniques.
Set up the tools needed to carry out the experiments covered by the
study protocols, e.g. development of simple games, decision interfaces.
Implement the above-mentioned protocols on cohorts of human operators.
Evaluate and promote the results obtained.
Deadline for applications: 20/09/2024
Desired start date: 01/11/2024
Application and additional info:
https://institutminestelecom.recruitee.com/o/post-doctorant-post-doctorante…
Contacts :
Sébastien Harispe, Associate Professor
sebastien.harispe(a)mines-ales.fr
Nicolas Soulié, Associate Professor
nicolas.soulie(a)imt-bs.eu
Best regards,
--
Andon Tchechmedjiev, PhD. Associate Professor of Artificial Intelligence
and Computer Engineering at EuroMov Digital Health in Motion, IMT Mines
Alès. Taxonomy and Semantics of Movement (SemTaxM) co-lead, Learning and
Complexity group member. Research expertise: Deep Learning, Knowledge
Engineering, Computational Linguistics and Semantics, Biomedical
Informatics, Neuroengineering and Human Movement Processing
Postdoctoral Researcher – Defining Authentic Inclusive Communication
Insight SFI Research Centre for Data Analytics
Data Science Institute Ref. No. 010548
JOB ADVERTISEMENT
Applications are invited from suitably qualified candidates for a
full-time, fixed term position as a Postdoctoral Researcher with Data
Science Institute <https://www.universityofgalway.ie/dsi/>at the
University of Galway, Ireland.
This position is funded by Science Foundation of Ireland and is available
from 1st October 2024 to contract end date of 30th September 2025.
Salary: Postdoctoral salary scale €44,346 – €56,764 per annum per annum,
(subject to the project’s funding limitations), and pro rata for shorter
and/or part-time contracts.
Closing date for receipt of applications is 17:00 (Irish Time) on 5th
Aug2024. It will not be possible to consider applications received after
the closing date.
ELIGIBILITY REQUIREMENTS
Essential Requirements:
- PhD in Natural Language Processing (NLP) or Linguistics
- Published at top conferences in the NLP field or in high impact factor
- Excellent understanding of experimental design and scientific
methodologies
- Strong command of oral and written English
- Good programming skills
Desirable Requirements:
- Strong knowledge of NLP equality, diversity, and inclusion
- Experience engaging in research collaborations with industry
- Experience in writing grant proposals
- Experience of working in national and/or EU research projects
To apply: Jobs – University of Galway.
<https://www.universityofgalway.ie/about-us/jobs/> Applications must be
submitted
How to apply guide
<https://www.universityofgalway.ie/human-resources/recruitment-and-selection…>
- For informal enquiries, please contact Bharathi Raja Chakravarthi
bharathi.raja(a)universityofgalway.ie
<bharathi.raja(a)universityofgalway.ie>and cc Dr Meghann L. Drury-Grogan
Meghann.Drury- <Meghann.Drury-Grogan(a)atu.ie> Grogan(a)atu.ie
<Meghann.Drury-Grogan(a)atu.ie>
- University’s Strategic Plan
<https://www.universityofgalway.ie/strategy2025/>
- Working in Research at University of Galway
<https://www.universityofgalway.ie/our-research/>
- Moving to Ireland (Euraxess) <https://www.euraxess.ie/>
- Applicant Information
<https://www.universityofgalway.ie/human-resources/recruitment-and-selection…>
- We reserve the right to re-advertise or extend the closing date for
this
- University of Galway is an equal opportunities
- All positions are recruited in line with Open, Transparent, Merit
(OTM) and Competency based
with regards,
Dr. Bharathi Raja Chakravarthi,
Assistant Professor / Lecturer-above-the-bar
School of Computer Science, University of Galway, Ireland
Insight SFI Research Centre for Data Analytics, Data Science Institute,
University of Galway, Ireland
E-mail: bharathiraja.akr(a)gmail.com , bharathi.raja(a)universityofgalway.ie
<bharathiraja.asokachakravarthi(a)universityofgalway.ie>
Google Scholar: https://scholar.google.com/citations?user=irCl028AAAAJ&hl=en
Website:
https://www.universityofgalway.ie/our-research/people/computer-science/bhar…
<https://www.universityofgalway.ie/our-research/people/computer-science/bhar…>
Second Call for Papers
NLP for Positive Impact Workshop
Miami, USA
November 15 or 16, 2024
(co-located with EMNLP 2024 <https://2024.emnlp.org/>)
https://sites.google.com/view/nlp4positiveimpact
*Submission*
Direct submission via ARR*: *link
<https://openreview.net/group?id=EMNLP/2024/Workshop/NLP4PI_Direct_Submission>
Deadline: August, 15th
For papers submitted to June (or earlier) ARR cycle: Commitment deadline to
the Workshop: August 20, 2024 Commit to the workshop: via this link
<https://openreview.net/group?id=EMNLP/2024/Workshop/NLP4PI_ARR_Commitment>
Notification of Acceptance: September 20, 2024
Camera-Ready Papers Due: October 3, 2024
Workshop Date: either November 15 or 16
All deadlines are 11:59 PM (Anywhere on Earth
<https://www.timeanddate.com/time/zones/aoe>)
*Submission Information*
We are using the EMNLP Submission Guidelines
<https://2024.emnlp.org/calls/main_conference_papers/#paper-submission-detai…>
for the workshop. Authors are invited to submit a full paper of up to 8
pages of content with unlimited pages for references. We also invite short
papers of up to 4 pages of content, including unlimited pages for
references. Final camera ready versions of accepted papers will be given an
additional page of content to address reviewer comments.
Summary
The widespread and indispensable use of language-oriented AI systems
presents new opportunities to have a positive social impact. NLP
technologies are starting to mature to the point where they could have an
even broader impact, supporting the UN sustainability goals
<https://sdgs.un.org/goals> by helping to address big problems such as
poverty, hunger, healthcare, education, inequality, COVID-19 and climate
change.
Our workshop aims to promote innovative NLP research that will positively
impact society, focusing on responsible methods and new applications. We
will encourage submissions from areas including (but not limited to):
-
Work that grounds the impact of NLP: Beyond developing a
better-performing NLP model, can we make a step further to connect the
model to actual social impact? Example directions include: case studies
of real-world deployments; or improving the deployment and maintenance of
NLP models in practice.
-
In addition to commonly recognized NLP for social good areas such as NLP
for healthcare, mental well-being, and many others, we also call for work
on neglected areas such as NLP for poverty, hunger, energy, climate change,
among others.
-
We also highly value work that builds on interdisciplinary expertise,
and encourages submissions of case studies or worked examples that seek to
expand the social impact of NLP through collaboration with other fields
(e.g., philanthropy, social science, political science, economics, HCI).
Special theme: This year, we would like to encourage submission providing
solutions or concepts to address digital violence. Digital violence
encompasses various forms of violence that utilize digital tools and media,
such as cell phones, apps, internet applications, and emails, and occurs
within digital spaces like online portals and social platforms. We aim to
explore how modern NLP and AI technologies can contribute to enhancing
safety in digital environments. At the workshop, you will have an
opportunity to connect and share your results with NGO representatives from
this field!
Submission types:
Thus, we would appreciate to see various types of works on this (but not
only) topic like:
-
automatic identification of various social needs, their corresponding
sizes and demographics of people affected;
-
position papers to propose promising new tasks or directions that the
field should pursue;
-
literature review of a subfield;
-
philosophical discussions of what how positive impact can be achieved
with NLP methods;
-
approaches to interdisciplinary collaboration;
-
user study designs, user surveys;
-
ethical considerations, and other related topics.
Note that we want submissions to our workshop to have some distinctive
features of social good implications, beyond a general paper on NLP. We
will require each submission to discuss the ethical and societal
implications of their work, and encourage a discussion of what "positive
impact" means in the work.
Organizers
Zhijing Jin (Max Planck Institute & ETH Zurich)
Daryna Dementieva (Technical University of Munich)
Giorgio Piatti (ETH Zürich)
Steven Wilson (Oakland University)
Oana Ignat (Santa Clara University)
Jieyu Zhao (University of Maryland, College Park)
Joel Tetreault (Dataminr, Inc.)
Rada Michaela (University of Michigan)
Contact Email
-
nlp4pi.workshop(a)gmail.com
First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS 2024)
Lancaster, UK, 29-30 July 2024
Call for Participation
We are pleased to share the NLPAICS 2024 conference programme, which you can view by clicking here - https://nlpaics.com/programme-2/.
To register, please visit https://nlpaics.com/registration/.
We very much hope to welcome you to NLPAICS 2024 at Lancaster!
The conference
Recent advances in Natural Language Processing (NLP), Deep Learning and Large Language Models (LLMs) have resulted in improved performance of applications. In particular, there has been a growing interest in employing AI methods in different Cyber Security applications.
In today's digital world, Cyber Security has emerged as a heightened priority for both individual users and organisations. As the volume of online information grows exponentially, traditional security approaches often struggle to identify and prevent evolving security threats. The inadequacy of conventional security frameworks highlights the need for innovative solutions that can effectively navigate the complex digital landscape for ensuring robust security. NLP and AI in Cyber Security have vast potential to significantly enhance threat detection and mitigation by fostering the development of advanced security systems for autonomous identification, assessment, and response to security threats in real-time. Recognising this challenge and the capabilities of NLP and AI approaches to fortify Cyber Security systems, the First International Conference on Natural Language Processing (NLP) and Artificial Intelligence (AI) for Cyber Security (NLPAICS’2024) serves as a gathering place for researchers in NLP and AI methods for Cyber Security. We invite contributions that present the latest NLP and AI solutions for mitigating risks in processing digital information.
Venue
The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS’2024) will take place at Lancaster University and is organised by the Lancaster University UCREL NLP research group.
Keynote speakers
We are delighted to announce the NLPAICS’2024 keynote speakers
- Iva Gumnishka (Humans in the Loop)
- Sevil Şen (Hacettepe University)
- Paolo Rosso (Universitat Politècnica de València)
- Jacques Klein (University of Luxembourg)
Sponsors
We are proud to announce the conference sponsors:
CodeAgent – Collaborative Agents for Software Engineering
Further information and contact details
The conference website is https://nlpaics.com/ and will be updated on a regular basis. The conference updates will also be available on social media (X - https://x.com/nlpaics, LinkedIn - https://linkedin.com/company/nlpaics/ )
Regards
Tharindu Ranasinghe
Dear all,
the QE shared task 2024 is ON!
You can now submit and test your quality estimation system(s) on a set of different languages and tasks: to predict translation quality at sentence level, to detect error spans, or even to correct translations!
For information on how to access the test data and the submission platforms, visit the shared task's webpage:
https://www2.statmt.org/wmt24/qe-task.html
Deadline to participate is July 31 (AoE).
Looking forward to receiving your predictions!
--
Best wishes,
on behalf of the organisers.
Dear all,
we are happy to invite you to participate in the Shared Task on Quality Estimation at WMT'24.
The details of the task can be found at: https://www2.statmt.org/wmt24/qe-task.html
New this year:
* We introduce a new language pair (zero-shot): English-Spanish
* Continuing from the previous edition, we will also analyse the robustness of submitted QE systems to a set of different phenomena which will span from hallucinations and biases to localized errors, which can significantly impact real-world applications.
* We also introduce a new task, seeking not only to detect but also to correct errors: Quality-aware Automatic Post-Editing! We invite participants to submit systems capable of automatically generating QE predictions for machine-translated text and the corresponding output corrections.
2024 QE Tasks:
Task 1 -- Sentence-level quality estimation
This task follows the same format as last year but with fresh test sets and a new language pair: English-Spanish. We will test the following language pairs:
* English to German (MQM)
* English to Spanish (MQM)
* English to Hindi (MQM & DA)
* English to Gujarati (DA)
* English to Telugu (DA)
* English to Tamil (DA)
More details: https://www2.statmt.org/wmt24/qe-subtask1.html
Task 2 -- Fine-grained error span detection
Sequence labelling task: predict the error spans in each translation and the associated error severity: Major or Minor.
We will test the following language pairs:
* English to German (MQM)
* English to Spanish (MQM)
* English to Hindi (MQM)
More details: https://www2.statmt.org/wmt24/qe-subtask2.html
Task 3 -- Quality-aware Automatic Post-editing
We expect submissions of post edits correcting detected error spans of the original translation. Although the task is focused on quality-informed APE, we also allow participants to submit APE output without QE predictions to understand the impact of their QE system. Submissions w/o QE predictions will also be considered official.
We will test the following language pairs:
* English to Hindi
* English to Tamil
More details: https://www2.statmt.org/wmt24/qe-subtask3.html
Important dates:
1. Test sets will be released on July 15th.
2. Participants can submit their systems by July 23rd on codalab.
3. System paper submissions are due by 20th August [aligned with WMT deadlines].
Note: Like last year, we aligned with the General MT and Metrics shared tasks to facilitate cross-submission on the common language pairs: English-German, English-Spanish, and English-Hindi (MQM).
We look forward to your submissions and feel free to contact us if you have any more questions!
Best wishes,
on behalf of the organisers.
Dear all,
LIACS currently has a vacancy for two assistant professor positions, which might be of interest to some people on this list.
Here’s the beginning of the vacancy:
"The Faculty of Science, Leiden Institute of Advanced Computer Science (LIACS), is seeking candidates for two Assistant Professors (0.8-1.0 FTE), one in generative AI and another in Human-centered AI. We seek to appoint an expert in the research area of Generative AI with focus on software systems and engineering (code generation, bug detection and repair, refactoring, and optimization but also at a larger scale such as architecture reconstruction and impact analysis for changes), prompt engineering (for content creation and data analysis), or diffusion models (for transforming the creation of high-fidelity data, such as images and simulations). Additionally, we seek to appoint an expert in Human-centered AI with focus on the designing of AI systems that prioritize human needs, usability, and collaboration, and/or on the involvement of humans in the training and refining processes (interactive machine learning)."
Here’s the full vacancy: https://www.universiteitleiden.nl/en/vacancies/2024/q3/150312-assistant-pro…
Best,
dr. Gijs Wijnholds
Assistant Professor in Natural Language Processing
Text Mining and Retrieval Group<https://tmr.liacs.nl/>
Leiden Institute of Advanced Computer Science
https://gijswijnholds.github.io
Friday, November 8 - Saturday, November 9
Brown Computer Science Department, Providence, RI
https://cs.brown.edu/people/in-memorium/eugene_charniak/
Brown University invites you to attend an academic memorial event to
commemorate the research and legacy of Eugene Charniak. Eugene, an ACL
Lifetime Achievement Award winner and ACL fellow, passed away in June
2023. His colleagues and students have organized a two-day workshop of
invited presentations of cutting-edge research with an emphasis on the
themes which defined Eugene's career: the legacy of classic statistical
NLP/ML, the sometimes-surprising effectiveness of simple baselines,
clever tricks for dealing with data sparsity such as self-training or
distant supervision, and unsupervised learning.
A full program will be posted later this summer. Mark Johnson will give
a keynote presentation, along with research talks by Regina Barzilay,
Michael Collins, Jason Eisner, Lillian Lee, Ani Nenkova, Ellie Pavlick,
Brian Roark, Chris Tanner and Byron Wallace. There will also be
opportunities to remember Eugene in a social setting, and a panel
discussion of the workshop's research themes.
The event will take place at the Brown Computer Science Department in
Providence, RI; attendees are responsible for finding their own
accommodations. Instructions for travel to Providence are available
here: https://cs.brown.edu/about/directions/. The program will begin at
9am on Friday the 8th, and conclude at 1:30pm on Saturday the 9th. All
members of the ACL community are welcome, whether you knew Eugene well
or not. Please mark your calendars now!
To stay in the loop about the event, please fill out this form:
https://docs.google.com/forms/d/e/1FAIpQLSe_7LZBSjP3Ur2XCTtsDtwnL_Jbxgh5Wfi…
If you have questions about the event, contact the organizers, Micha
Elsner (melsner0(a)gmail.com) and David McClosky
(david.mcclosky(a)gmail.com).