We are excited to announce the 2nd edition of the Open Language Data Initiative shared task at WMT25, co-located with EMNLP 2025.
**TASK DESCRIPTION**
The primary goal of this shared task is to expand OLDI’s open datasets to more languages. We are soliciting contributions to the following:
- The MT evaluation dataset FLORES+.,
- The MT Seed dataset.,
- Other high-quality, massively-parallel and open-source datasets.,
Contributions may consist of either the addition of entirely new languages, varieties or dialects to the above datasets, or substantial improvements to existing datasets. To describe and publicise their contributions, task participants will be asked to submit a 4-6 page paper to be presented at the WMT 2025 conference.
**IMPORTANT DATES**
All dates follow WMT/EMNLP.
- Paper and data submission deadline: 14 August,
- Notification of acceptance: 13 September,
**MORE INFORMATION**
- Shared task website: https://www2.statmt.org/wmt25/open-data.html,
- OLDI website: https://oldi.org/
Dear colleagues,
We are pleased to announce the first call for papers of the
*1st Workshop on Multilingual Data Quality Signals at COLM 2025*
Important information:
🗓️ CfP Deadline: June 23, Workshop: October 10
📍 Montréal, Canada
🌐 https://wmdqs.org
Scope
Recent research has shown that large language models (LLMs) not only need large quantities of data, but also need data of sufficient quality. Ensuring data quality is even more important in a multilingual setting, where the amount of acceptable training data in many languages is limited. Indeed, for many languages even the fundamental step of language identification remains a challenge, leading to unreliable language labels and thus noisy datasets for underserved languages.
In response to these challenges, we will be holding the first Workshop on Multilingual Data Quality Signals (WMDQS) in tandem with COLM. We invite the submission of long and short research papers related to data quality in multilingual data.
Even though most previous work on data quality has been targeted at LLM development, we believe that research in this area can also benefit other research communities in areas such as web search, web archiving, corpus linguistics, digital humanities, political sciences and beyond. We therefore encourage submissions from a wide range of disciplines.
WMDQS will also include a shared task on language identification for web text. We invite participants to submit novel systems which address current problems with language identification for web text. We will provide a training set of annotated documents sourced from Common Crawl to aid development.
Topics
We welcome submissions of (1) original research papers, (2) review/opinion papers, (3) online systems on the topics listed below, and (4) extended abstracts. We especially welcome work-in-progress projects and all novel ideas covering research in multilinguality, underserved/low-resource languages, under-represented linguistic communities and all types of work covering data quality signals. Suggested areas include:
- Data pipelines for data annotation and data filtering
- Undesirable content detection in a multilingual setting
- Multilingual or language independent content ranking
- Human annotation platforms and systems
- Multilingual tokenization mechanisms
- Small language models and embeddings
- Linguistic studies in underserved languages
- Corpus creation and curation methods, especially for underserved languages
- Machine translation
- Digital humanities
- Historical and constructed languages
Shared task
The lack of training data—especially high-quality data—is the root cause of poor language model performance for many languages. One obstacle to improving the quantity and quality of available text data is language identification (LangID or LID). Lang ID remains far from solved for many languages. Several of the commonly used LangID models were introduced in 2017 (e.g. fastText and CLD3). The aim of this shared task is to encourage innovation in open-source language identification and improve accuracy on a broad range of languages.
All accepted authors will be invited to contribute a larger paper, which will be submitted to a high-impact NLP venue.
Important dates for the Workshop:
Workshop paper submission deadline: June 23, 2025
Workshop paper acceptance notification: July 24, 2025
Workshop: October 10, 2025
Important dates for the Shared Task:
1st Deadline to contribute annotations: July 7, 2025
1st Annotations released (train split): July 14, 2025
Abstract Deadline: July 21, 2025
Decision Notification: July 24, 2025
Camera Ready Deadline: September 21, 2025
(All deadlines are 23:59 AoE.)
Organizers:
For any questions, please drop a mail to wmdqs-pcs(a)googlegroups.com
Program Chairs:
Pedro Ortiz Suarez (Common Crawl Foundation)
Sarah Luger (MLCommons)
Laurie Burchell (Common Crawl Foundation)
Kenton Murray (Johns Hopkins University)
Catherine Arnett (EleutherAI)
Organizing Committee:
Thom Vaughan (Common Crawl Foundation)
Sara Hincapié (Factored)
Rafael Mosquera (MLCommons)
KlarText Workshop on German Text Simplification & Readability Assessment
Co-located with KONVENS 2025 | Hildesheim, Germany | 10 September 2025
Website: https://klar-text.github.io/
============================================================
Please be reminded that the KlarText workshop paper submission deadline is in three weeks. The event aims to unite researchers, practitioners, and industry experts to discuss state-of-the-art methods in German text simplification and readability assessment. Our focus is to raise awareness about the diverse simplification goals and language forms in German, while attracting researchers who are addressing the challenges associated with this field.
Topics of interest include (but are not limited to):
- German Text Simplification
- Readability Assessment
- Resources & Approaches for Leichte Sprache
- The Role of Large Language Models (LLMs)
- Resources & Benchmarks
- Evaluation & Human-Centered Assessment
- Applications & Real-World Impact
- Cross-Linguistic & Multilingual Perspectives
Important Dates
- Submission deadline: June 30, 2025
- Notification of acceptance: August 1, 2025
- Camera-ready version due: August 15, 2025
- Workshop date: September 10, 2025
Submissions are managed via OpenReview (https://openreview.net/group?id=GSCL.org/KONVENS/2025/Workshop/KlarText).
Organizing Committee
- Salar Mohtaj, DFKI
- Stefan Hillmann, Technische Universität Berlin
- Sebastian Möller, Technische Universität Berlin
- Georg Groh, Technische Universität München
- Hadi Asghari, Technische Universität Berlin
- Miriam Anschütz, Technische Universität München
Contact
For questions or inquiries, please contact:
Salar Mohtaj – salar.mohtaj(a)dfki.de
*** First Call for Workshop & Tutorial Proposals
The 31st Annual ACM Conference on Intelligent User Interfaces (IUI 2026)
March 23-26, 2026, 5* Coral Beach Hotel & Resort, Paphos, Cyprus
https://iui.hosting.acm.org/2026/
We are pleased to invite proposals for workshops and tutorials to be held in conjunction with
the 31st International ACM Conference on Intelligent User Interfaces (ACM IUI 2026), Paphos,
Cyprus.
Workshops aim to provide a venue for presenting research on emerging or specialized topics
of interest and to offer an informal forum for discussing research questions and challenges.
Potential workshop topics should be related to the general theme of the conference
(“Where HCI meets AI”).
Tutorials aim to provide fundamental knowledge and experience on topics related to intelligent
user interfaces and the intersection between Human-Computer Interaction (HCI) and Artificial
Intelligence (AI).
We welcome proposals for a wide range of *full-day* or *half-day* workshops and tutorial
formats and activities, including but not limited to:
• Mini Conferences: Workshops that focus on a specific topic and may have their own paper
submission and review processes.
• Interactive Formats: Workshops that encourage active participation and hands-on
experiences through break-out sessions or group work to explore specific topics. They may
have their own paper submission and review process or target a report summarizing the
discussions and outcomes.
• Emerging Work Sessions: Workshops that foster discussion around emerging ideas.
Organizers may raise specific topics and invite position papers, late-breaking results, or
extended abstracts.
• Project-Centric Formats: Workshops tied closely to a specific existing large-scale funded
project(e.g., NSF, EU) with the goal to engage a broader community.
• Interactive Competitions: Formats that invite individuals and teams to participate in
challenges or hackathons on selected topics relevant to IUI.
• Tutorials: Sessions that provide a structured instruction on topics aligned with the conference
theme, such as HCI methods, AI techniques, methodological frameworks, or tools for building
intelligent user interfaces.
Review and Oversight by Workshop and Tutorial Chairs
Proposals will be reviewed and evaluated by the Workshop and Tutorial Chairs. It is possible
that workshops may be cancelled, shortened, merged, or restructured if there are insufficient
submissions.
Workshop and Tutorial summaries will be included in the ACM Digital Library for ACM IUI 2026.
We will also publish joint workshop proceedings for accepted workshop submissions (through
CEUR or a similar venue).
Responsibilities of Workshop and Tutorial Organizers
• Coordinate the Call for Papers, including solicitation, submission handling, and peer review
process.
• Create and maintain a dedicated website with Workshop or Tutorial information. The IUI
Website 2026 will link to this page.
• Prepare and communicate Call for Participation, targeting both IUI and broader relevant
communities (e.g., via mailing lists, social media, newsgroups, or offline events)
• Facilitate the planned activities, including paper presentations, discussions, and/or
interactive elements.
• Submit a workshop or tutorial summary for inclusion in the ACM Digital Library.
• Collect camera-ready papers and author agreements from workshop participants for the joint
workshop proceedings (CEUR or similar).
Note that for the joint proceedings (CEUR or similar), submissions should be peer-reviewed
and will need to meet publishers’ guidelines. CEUR, for example, requires a 5-page minimum
per contribution. Note that not all workshop and tutorial formats listed above may meet these
requirements, and we may not be able to include them.
IUI 2026 is an in-person event, and we expect workshop organizers to attend, allowing the
workshop to be conducted on-site. One author per paper is expected to attend in person to
present the work.
Proposal Format
Workshop or tutorial proposals should be a maximum of four pages long (single-column
format). Prepare your submission using the latest templates: Word Submission Template
(https://authors.acm.org/binaries/content/assets/publications/taps/acm_submi…),
or the LaTex Template
(https://authors.acm.org/proceedings/production-information/preparing-your-a…).
For Latex, please use “\documentclass[manuscript,review]{acmart}”.
The proposals should be organized as follows:
• Name and title: A one-word acronym and a full title. Please indicate “(Workshop)” or
“(Tutorial)” after the title, as appropriate.
• Abstract: A brief summary of the workshop or tutorial.
• Description of workshop or tutorial topic: Should discuss the relevance of the proposed
topic to IUI and its interest for the IUI 2026 audience. Include a concise discussion of why this
workshop or tutorial is particularly relevant for the intended audience and how it will
complement and enhance topics covered at the main conference.
• Previous history: List of previous workshops or tutorials on this topic, including the
conferences that hosted them and the number of participants. If available, report on past
editions of the workshop (including URLs), along with a brief statement of the workshop series
(e.g., covering topics, number of paper submissions, and participants), as well as post-
workshop publications over the years and acceptance statistics. If this is the first edition of the
workshop, describe how it differs from others on similar topics (e.g., by including conference
names and years).
• Organizer(s): Names, affiliations, emails, and web pages of the organizer(s). Provide a brief
description of the background of the organizer(s). Strong proposals normally include organizers
who bring differing perspectives on the topic and are actively connected to the communities of
potential participants. Please indicate the primary contact person and the organizers who will
attend the workshop. Also, please provide a list of other workshops or tutorials organized by
workshop organizers in the past.
• Workshop program committee: Names and affiliation of the members of the (tentative)
workshop program committee that will evaluate the workshop submissions.
• Participants: Include a statement of how many participants you expect and how you plan to
invite participants for the workshop or tutorial. If possible, include the names of at least 10
people who have expressed interest in participating in the workshop or tutorial.
• Workshop or Tutorial activities: A brief description of the format regarding the mix of
events or activities, such as paper presentations, invited talks, panels, demonstrations,
teaching activities, hands-on practical exercises, and general discussion. Please also list here
any materials you will make available to tutorial participants, such as slides, access to hardware
or software, and handouts.
• Planned outcomes of the workshop or tutorial: What are you hoping to achieve by the end
of the workshop or tutorial? Please list here any planned publications or other outcomes
expected.
• Length: Full-day or half-day.
Submission Platform
• All materials must be submitted electronically to PCS 2.0
http://new.precisionconference.com/~sigchi by the proposal submission deadline.
• In PCS 2.0, first click "Submissions" at the top of the page, from the dropdown menus for
society, conference, and track, select "SIGCHI", "IUI 2026", and then "IUI 2026 Workshops" or
“IUI 2026 Tutorials”, respectively, and press "Go".
We encourage both researchers and industry practitioners to submit workshop proposals. To
support diverse perspectives in the workshops, we strongly recommend including organizers
from varied institutions and backgrounds.
Furthermore, we welcome workshops with an innovative structure that can attract diverse types
of contributions and foster valuable interactions.
Prospective organizers are encouraged to contact the Workshop and Tutorial Chairs in advance
(workshops2026(a)iui.acm.org) to discuss ideas, receive feedback, or seek assistance in
preparing engaging proposals. Especially for workshop proposals featuring innovative
interactive formats, we are happy to help further develop and implement the ideas.
Important Dates (AoE)
• Workshop Proposals: August 22, 2025
• Decision notification: September 19, 2025
• Tutorial Proposals: October 17, 2025
• Tutorial Decision Notification: Nov 21, 2025
• Camera-ready Summaries: February 6, 2026
Workshop and Tutorial Chairs
Karthik Dinakar, Pienso, USA
Werner Geyer, IBM Research, USA
Patricia Kahr, University of Zurich, Switzerland
Antonela Tommasel, CONICET, Argentina
The Data Mining and Machine Learning research group at the University of Vienna is seeking graduates or advanced MSc students in Computer Science, Computational Linguistics, Statistics, or related fields who are interested in pursuing a PhD in Explainability for Machine Learning and Natural Language Processing. The successful candidate will join the group as a pre-doctoral researcher. The position is funded for three years and will be supervised by Prof. Benjamin Roth.
Application deadline: 24 June 2025
Research topics may include:
Personalized explanations of large language models
Explanations for complex AI agents
Training data-based explanations
Usability aspects of explanations
Evaluation methods for explainable AI
More information: https://jobs.univie.ac.at/job/University-assistant-predoctoral/1212525201/
--
Univ.-Prof. Dr. Benjamin Roth
Digitale Textwissenschaften
Universität Wien
Kolingasse 14
Raum 5.17
1090 Wien
email: benjamin.roth(a)univie.ac.at
tel: +43 14277 79513
virtual coffee (Tuesday 2pm CEST): https://www.benjaminroth.net/virtual_coffee
web: https://dm.cs.univie.ac.at/team/person/112089/
Call for Abstracts – Computational Psycholinguistics Meeting 2025
We are pleased to announce that the abstract submission for the first Computational Psycholinguistics Meeting 2025 is open!
The meeting will take place on December 18–19, 2025, in Utrecht, the Netherlands. It aims to connect researchers using (neuro-)symbolic, Bayesian, deep-learning, connectionist, and mechanistic models (e.g., ACT-R) in studying human language production, perception, and processing.
Keynote Speakers: Stefan Frank (Radboud University), Vera Demberg (Saarland University)
For detailed guidelines, templates, and additional information, visit our website: <https://cpl2025.sites.uu.nl/> https://cpl2025.sites.uu.nl<https://cpl2025.sites.uu.nl/>/
Abstracts must be submitted in PDF format via OpenReview by June 29, 2025 at:
https://openreview.net/group?id=UU.nl/Utrecht_University/2025/CPL
We look forward to your contributions!
Organizers: Jakub Dotlačil, Lena Jäger, Bruno Nicenboim, Ece Takmaz
10th Symposium on Corpus Approaches to Lexicogrammar (LxGr2025)
LxGr2025 will be held online on Friday 11 and Saturday 12 July 2025.
Symposium programme and registration (free): https://ehu.ac.uk/lxgr
If you have any problems registering, or have questions, please contact lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>.
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
2nd CfP: The 5th Workshop on Computational Linguistics for the
Political and Social Sciences (CPSS-2025)
https://cpss-sig.github.io/CPSS-2025
CPSS-2025 will be held in September 2025, co-located with KONVENS
<https://konvens-2025.hs-hannover.de> in Hildesheim, Germany.
The workshop will provide a forum for the presentation and discussion of
innovative research on all aspects of using CL/NLP techniques for the
political and social sciences, including:
* Modeling political communication with NLP (e.g. topic
classification, position measurement)
* Mining policy debates from heterogeneous textual sources
* Modeling complex social constructs (e.g. populism, polarization,
identity) with NLP methods
* Political and social bias in language models
* Methodological insights in interdisciplinary collaboration:
workflows, challenges, best practices
* NLP support to understand and support democratic decision making
* Resources and tools for Political/Social Science research
* and many more...
CPSS-2025 will be held in person.
Special Theme
The special theme of CPSS-2025 is
*Validation and best practices for using NLP in political and social
science research*.
In addition to CPSS's general topics, we specifically invite submissions
on this year's special theme, focussing on validation and best practices
for applying NLP techniques for research in the political and social
sciences. We are especially interested in papers addressing issues
related to:
* Data quality in human and synthetic data
* Data leakage and contamination, especially in LLMs
* New ways to collect data such as dataset donation
* Validation of results beyond the train-dev-test paradigm of NLP and
data science.
* Any other topics related to the special theme.
*Important Dates*
All submission deadlines are 11:59 p.m. UTC-12:00 “anywhere on Earth.”
Workshop papers due June 13, 2025
Notification of acceptance Aug 1, 2025
Camera-ready papers due Aug 10, 2025
Workshop date Sep 2025
*Submissions*
We solicit two types of submissions:
*archival papers* describing original and unpublished work (long papers:
max. 8 pages, references/appendix excluded; short papers: max 4 pages,
references/appendix excluded). Accepted papers will be published on the
ACL anthology. For the submission format, refer to the KONVENS guidelines.
*non-archival papers* (1-page abstracts, references excluded) describing
ongoing work, PhD projects, or already published research.
For more details, please refer to the CPSS-2025 website:
https://cpss-sig.github.io/CPSS-2025
*CPSS 2025 organising committee*
Dennis Assenmacher (GESIS), Christopher Klamm (U-Mannheim), Gabriella
Lapesa (GESIS/U-Düsseldorf),
Simone Ponzetto (U-Mannheim), Ines Rehbein (U-Mannheim), Indira Sen
(U-Mannheim)
--
Ines Rehbein
Data and Web Science Group
University of Mannheim, Germany
New distance-learning route: Applications for MSc AI for Translation and Interpreting Studies now open (academic year 2025-26 entry)
The Centre for Translation Studies (CTS) at the University of Surrey is pleased to announce the launching of AI in Translation and Interpreting Studies MSc. This MSc programme has an in-person route, but most crucially, a new distance-learning route, for both full-time and part-time modes of study. Classes are delivered synchronously (for all routes and modes of study).
This unique and innovative course draws on the research and pedagogy CTS is well known for, namely the responsible integration of professional practice with AI tools in multilingual mediation, translation and interpreting technologies, translation as intercultural mediation, corpus-based translation, audiovisual translation and automatic translation. As such, the course is ideally suited for students who wish to work at the at the interface of (traditional) Languages degrees, Natural Language Processing, Machine Learning and Machine Translation. This year we encourage applications from students interested in how Natural Language Processing and Large Language Models benefit translators and interpreters.
The new configuration offers a balance of hands-on training and critical skills when using language technologies across the academic year. As an MSc student, you will take three compulsory taught modules in semester 1 and select three optional modules in semester 2 (90 credits). You will then complete your degree with a long dissertation (90 credits), allowing you to tackle a project in greater depth. For full programme details, a special fee offer for the online/distance-learning route and the overall structure, please visit AI for Translation and Interpreting Studies MSc masters course | University of Surrey<https://www.surrey.ac.uk/postgraduate/ai-translation-and-interpreting-studi…>
Our MSc course comes in the wake of further diversification of our postgraduate programmes. It constitutes a change informed by constant dialogue with our students which resulted in a revamp of our portfolio. See our staff-student partnership project "Translation in the Era of General Artificial Intelligence"<https://www.surrey.ac.uk/news/human-and-machine-harmony-centre-translation-…> This change further aligns with recent developments in translation and translation technology projects which are done with the help of AI or AI-powered tools.
If you feel that an MSc is not for you, you can check our other postgraduate courses on topics related to translation and interpreting at: https://www.surrey.ac.uk/centre-translation-studies/study/postgraduate-cour…
Watch our video "More than an MA": https://www.youtube.com/watch?v=R2oVf3X2LEg
---
Prof Constantin Orăsan
Professor of Language and Translation Technologies
Centre for Translation Studies<https://www.surrey.ac.uk/centre-translation-studies> | School of Literature and Languages<https://www.surrey.ac.uk/school-literature-languages>
Personal page: https://www.surrey.ac.uk/people/constantin-orasan
Office: 06LC03, Phone: +44 (0) 1483 68 4115
Library and Learning Centre, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Dear Corpora list organizers,
Please find below our submission for circulation:
======
The First Workshop on Optimal Reliance and Accountability in Interactions
with Generative Language Models (*ORIGen*) will be held in conjunction with
the Second Conference on Language Modeling (COLM) at the Palais des Congrès
in Montreal, Quebec, Canada, on October 10, 2025!
With the rapid integration of generative AI, exemplified by large language
models (LLMs), into personal, educational, business, and even governmental
workflows, such systems are increasingly being treated as “collaborators”
with humans. In such scenarios, underreliance or avoidance of AI assistance
may obviate the potential speed, efficiency, or scalability advantages of a
human-LLM team, but simultaneously, there is a risk that subject matter
non-experts may overrely on LLMs and trust their outputs uncritically, with
consequences ranging from the inconvenient to the catastrophic. Therefore,
establishing optimal levels of reliance within an interactive framework is
a
critical open challenge as language models and related AI technology
rapidly
advances.
* What factors influence overreliance on LLMs?
* How can the consequences of overreliance be predicted and guarded against?
* What verifiable methods can be used to apportion accountability for the
outcomes of human-LLM interactions?
* What methods can be used to imbue such interactions with appropriate
levels
of “friction” to ensure that humans think through the decisions they make
with LLMs in the loop?
The ORIGen workshop provides a new venue to address these questions and
more
through a multidisciplinary lens. We seek to bring together broad
perspectives from AI, NLP, HCI, cognitive science, psychology, and
education
to highlight the importance of mediating human-LLM interactions to mitigate
overreliance and promote accountability in collaborative human-AI
decision-making.
Submissions are due June 20, 2025. Please see our call for papers [1] for
more!
[1] https://origen-workshop.github.io/submissions/
======
Best regards,
Nikhil Krishnaswamy
Assistant Professor of Computer Science
*Colorado State University*