In this newsletter:
LDC data and commercial technology development
New publications:
Mixer 7 English Speech<https://catalog.ldc.upenn.edu/LDC2025S08>
AIDA Scenario 1 Evaluation Topic Source Data, Annotation and Assessment<https://catalog.ldc.upenn.edu/LDC2025T13>
LORELEI Hindi Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2025T12>
________________________________
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for further information.
________________________________
New publications:
Mixer 7 English Speech<https://catalog.ldc.upenn.edu/LDC2025S08> was developed by LDC and contains 12,321 hours of audio recordings of interviews, transcript readings, and conversational telephone speech involving 222 distinct English speakers. This material was collected by LDC in 2010-2011 as part of the Mixer project, and the recordings were used in the 2012 NIST SRE test set.
Recruited speakers were connected through a robot operator to carry on casual conversations on a pre-set topic lasting up to 10 minutes. Participants also visited LDC's Human Subjects Collection Lab equipped with a 14-microphone array where they participated in interviews and transcript readings, and conducted telephone calls under varying conditions. Selected speaker metadata was also collected.
2025 members can access this corpus through their LDC accounts. This corpus is a Members-Only release and is not available for non-member licensing. Contact ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> for information about membership.
*
AIDA Scenario 1 Evaluation Topic Source Data, Annotation and Assessment<https://catalog.ldc.upenn.edu/LDC2025T13> was developed by LDC and is comprised of English, Russian, and Ukrainian web documents (text, video, image), annotations, and assessments used in the AIDA Phase 1 pilot and final evaluations. The Phase 1 scenario focused on political relations between Russia and Ukraine in the 2010s. The material in this corpus covers the following events: Suspicious Deaths and Murders in Ukraine (January-April 2015); Odessa Tragedy (May 2, 2014); and Siege of Sloviansk and Battle of Kramatorsk (April-July 2014).
The corpus contains 10,522 documents, annotations for 386 of those documents, and assessment results covering 77,965 responses in 1,525 of those documents. Annotations were performed in three steps: (1) within-document labels for scenario-related entities, relations, and events; (2) coreference annotation across documents by linking information elements to a knowledge base; and (3) indications of any relationship between labeled events/relations and hypotheses about the scenario. In the assessment phase, LDC annotators reviewed and judged system response files to provide evaluation organizers with a means for scoring submissions. Assessment tasks included zero-hop assessment, class-based assessment, graph assessment, and hypothesis assessment.
The DARPA AIDA (Active Interpretation of Disparate Alternatives) program aimed to develop a multi-hypothesis semantic engine to generate explicit alternative interpretations of events, situations, and trends from a variety of unstructured sources. LDC supported AIDA by collecting, creating, and annotating multimodal linguistic resources in multiple languages.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Hindi Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2025T12> contains over 26 million words of Hindi monolingual text, 363,00 words of which were translated into English, 1.07 million words of found Hindi-English parallel text, and 118,000 Hindi words translated from English data. Approximately 103,000 words were annotated for simple named entities and over 25,000 words were annotated for full entity (including nominals and pronouns), entity linking, and situation frames (identifying entities, needs and issues). Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
===================================================================
Call for Program Committee (Self-) Nominations
European Conference on Information Retrieval (ECIR) 2026
Deadline: September 28, 2025
Application form: https://forms.gle/LsvXHWEa859vi8LL8
===================================================================
Are you a Ph.D. student or at the early stage of your career looking to gain experience in reviewing scientific papers? Do you know someone who can be an excellent reviewer? The European Conference on Information Retrieval (ECIR) invites you to nominate yourself or others to join the Program Committee (PC) for the short paper track.
To apply, please fill out this Google Form: https://forms.gle/LsvXHWEa859vi8LL8, which includes questions about your past experience and current research progress. Applications will be reviewed by the conference PC chairs and accepted people will be notifed.
As a PC member, you will have the opportunity to read and evaluate submissions and contribute to the selection of high-quality research papers for presentation at the conference. You will also have the chance to interact with leading researchers in the field of information retrieval and build your network of contacts.
The deadline for applications is September 28, 2025. If you have any questions, please do not hesitate to contact us.
We look forward to your applications.
ECIR Short Paper PC Chairs
Mohammad, Sean, and Chrstine
Dear colleagues,
We are pleased to invite contributions to a Special Collection of the Journal of Open Humanities Data (JOHD) dedicated to language data reuse. The special collection is titled “Language datasets reuse: opportunities, challenges, and best practices”.
This Special Collection will highlight how existing deposited mono- and multilingual language datasets (in any modality) have been reused in research across the humanities. Contributions may also describe cases where dataset reuse led to the creation of a new dataset. We welcome papers that present both successful and less successful experiences of language data reuse, with reflection on encountered issues and lessons learned. Position papers on how data creators can maximize the future reuse potential of language datasets are also encouraged.
We invite the following type of submission:
* Discussion papers (3,000–5,000 words): Longer narratives illustrating the reuse of one or more existing deposited language datasets (preferably created by researchers other than the authors) or showcasing reuse-focused dataset design approaches. Papers should follow the discussion paper template for this Special Collection<https://docs.google.com/document/d/1LnlJPZX-lNQxtvAiI_9T3Bkio3V1bFKD/edit?u…>.
Key information:
* Submission deadline: 30 January 2026
* Submission link: https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/submissi…
* Publication fees: Waivers are available upon request. For papers describing reuse of datasets hosted in a CLARIN centre, publication fees may be covered by CLARIN.
Guest Editorial Team
* Darja Fišer (Executive Director of CLARIN; University of Ljubljana; Institute of Contemporary History, Slovenia)
* Francesca Frontini (ILC-CNR, Pisa; CLARIN-IT;, Italy)
Coordinating Editor
* Paola Marongiu (ILC-CNR, Pisa)
We look forward to your contributions and to advancing the conversation on the reuse of language data in the humanities.
With best regards,
The Special Collection Editorial Team
---
Elisa Gorgaini
Communication Officer - CLARIN ERIC
Utrecht University | Drift 10, 3512 BS Utrecht, The Netherlands
e.gorgaini(a)uu.nl<mailto:e.gorgaini@uu.nl> | elisa(a)clarin.eu<mailto:elisa@clarin.eu>
www.clarin.eu<https://www.clarin.eu>
The TurkuNLP research group at the University of Turku, Finland is seeking two fully-funded PhD Researchers to join our team!
These PhD positions are part of HAIF (Human-Centric Artificial Intelligence for Sustainable Future), a prestigious doctoral training project, funded by the European Union’s Horizon Europe research and innovation programme’s Marie Skłodowska-Curie Action.
Our research is at the forefront of NLP, covering key areas such as:
- Large language models
- Large corpora
- Multilingual methods
- Health applications of NLP
Check out the website https://haif.utu.fi/ for details
Reach out to shaoxiong.ji(a)utu.fi if you’re interested in multilingual LLMs and health applications.
Application deadline: 30 September, at 15:00 (Helsinki/Europe).
Dear all,
This is the last call, with an extended deadline, to invite everyone interested to attend the 2nd UniDive Training School on Linguistic Diversity in NLP.
Dates: 20-24 January, 2026 (Tuesday-Saturday)
Location: Yerevan State University (YSU), Yerevan, Armenia
Coordinating Project: UniDive (Universality, Diversity and Idiosyncrasy in Language Technology)
Website: https://unidive.lisn.upsaclay.fr/doku.php?id=meetings:other-events:2nd_unid…
<https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://unidive…>Cost: Participants selected on the basis of their applications will be reimbursed, details below.
We are happy to announce the 2nd edition of UniDive Training School on Universality, Diversity and Idiosyncrasy in Language Technology. It is dedicated mainly (but not exclusively) to young researchers and investigators. Researchers working on low-resourced languages, dialects and varieties are particularly welcome. See below for the application details.
TRAINING SCHOOL ACTIVITIES
Courses
Linguistic Typology for NLP researchers: Methods and Resources in the 21st century <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://unidive…> - by Harald Hammarström (Uppsala University, Sweden) and Luigi Talamo (Saarland University, Germany)
Large Language Models for Low-Resourced Languages: Hands-On Approaches to Cultural and Genre Diversity in NLP <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://unidive…> - by Maria Carp (Romanian Academy, Bucharest, Romania) and Nina Hosseini-Kivanani (RTL Luxembourg and University of Luxembourg)
Diversity quantification in natural language processing: The why, what, where and how <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://unidive…> - by Louis Estève <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://www.lis…> (Université Paris-Saclay, CNRS, France), Marie-Catherine de Marneffe (Université Catholique Louvain, Belgium), Nurit Melnik (The Open University, Israel), Agata Savary <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://perso.l…> (Université Paris-Saclay, CNRS, France) and Olha Kanishcheva (Heidelberg University, Germany, SET University, Ukraine)
Poster sessions
Brainstorming hackathon on open issues submitted by the trainees
CALL FOR APPLICATIONS
Each applicant should submit a project related to language diversity and the topics of the training school (e.g., research in fields such as linguistic variation, diversity, typology, and computational linguistics, including specific topics like diversity quantification across textual genres, corpora and languages). The length of the application should be 2 pages (excluding references). The application should contain:
The title
Applicant’s name and affiliation (including the country of the affiliation)
A list of 3-4 key-words
Description of the project related to the topics of the training school
Explanation how the participation in the training school will be useful for the project
Short statement of the project phase (planned, started, ongoing)
The projects are to be submitted via the OpenReview portal <https://www.google.com/url?q=https://openreview.net/group?id%3DUniDive/2025…>
IMPORTANT DATES
Trainee's application deadline: 19 September, 2025 (AoE)
Acceptance notification: 24 October, 2025
Invitations for visas (if needed): 27 October, 2025
Official confirmations of travel grants: early November
Training school: January 20-24, 2026
TRAINEE’S SELECTION CRITERIA
We can fund about 40 trainees to come to Yerevan (additionally to local trainees). In case of a larger number of candidates, the selection criteria will include:
Trainee’s country: trainees only from COST countries and Near-Neighbour Countries can be funded. See here and here
Age: Young Researchers and Investigators, i.e. under the age of 40, are promoted
Gender and geographical balance (notably between Inclusiveness Target Countries and others COST countries)
Relevance and quality of the project submitted by the trainee
Status of the languages, dialects, or varieties on which the trainee intends to work (low-resourced languages, dialects, or genres are promoted)
If you are not selected on the basis of these criteria and you can find other financial sources to cover your travel, accommodation and meals, you are also welcome to participate.
The authors of the selected projects will present them in a poster session during the Training School.
PROGRAM CHAIRS
Victoria Bobicev
Anna Danielyan
Santiago Herrera
Esther Ploeger
Wessel Poelman
Ranka Stanković
Abigail Walsh
For any question, please contact the organisers at adanielyan82(a)gmail.com <mailto:adanielyan82@gmail.com> and s.herrera(a)parisnanterre.fr <mailto:s.herrera@parisnanterre.fr>
Looking forward to seeing you in Yerevan,
Program Chairs
*** First Combo Call for Workshop Papers ***
The 33rd IEEE International Conference on Software Analysis, Evolution
and Reengineering (SANER 2026)
17 March, 2026, 5* St. Raphael Resort and Marina, Limassol, Cyprus
https://conf.researchr.org/track/saner-2026
SANER 2026 will feature the following workshops. Please visit the workshops' websites
and/or contact their organisers for more details.
SQA4AI – Software Quality Assurance for Artificial Intelligence
https://sqa4ai-ws.github.io
Greenvolve – The Green Software Evolution Workshop
https://greenvolve.github.io
Fairness 2026 – 2nd International Workshop on Fairness in Software Systems
https://fairnessworkshop.github.io
F-TRANSFER – Facilitating Continuous Education and Training Through AI in SE
https://www.cs.ubbcluj.ro/~avescan/f-transfer-2026/
IWBOSE 2026 – Ninth International Workshop on Blockchain Oriented Software
Engineering
https://www.agile-group.org/iwbose2026/
VST 2026 – 9th Workshop on Validation, Analysis and Evolution of Software Tests
https://vstworkshop.github.io/vst2026/
MSR4P&S 2026 – 4th International Workshop on Mining Software Repositories
Applications for Privacy and Security
https://msr4ps.github.io
SUBMISSION LINK
https://easychair.org/my/conference?conf=saner2026
IMPORTANT DATES
• Abstract Submission: 12 December, 2025
• Paper Submission: 18 December, 2025
• Notification: January 14, 2026
• Camera-Ready: 20 January, 2026
All dates are 23:59h AoE (anywhere on Earth).
ORGANISATION
General Chair
• Georgia Kapitsaki, University of Cyprus, Cyprus
Local Organizing Chair
• George Angelos Papadopoulos, University of Cyprus, Cyprus
Workshops and Tutorials Co-Chairs
• Marcelo De Almeida Maia, Federal University of Uberlandia, Brazil
• Juri Di Rocco, University of L'Aquila, Italy
The Swedish Excellence Centre for Computational Social Science (SweCSS) is seeking a Postdoctoral Fellow.
As a postdoctoral fellow, your main task will be to conduct independent, innovative research in computational social science. SweCSS researchers combine large-scale data, computational methods, and theory-driven approaches to better understand society. The center aims to position Sweden at the forefront of international CSS research and to serve as an educational hub and global research node.
More details about the position and how to apply: https://liu.se/en/work-at-liu/vacancies/27295
Read more about SweCSS: https://liu.se/en/research/swecss
For any questions about the position or the centre, feel free to contact me.
Best regards
Marco Kuhlmann
Professor
Department of Computer and Information Science
Linköping University
Dear all,
We have an open PhD position in Argumentative AI and NLP at the University of St.Gallen (Switzerland), starting 1 November (or by agreement).
The position focuses on developing methods to evaluate reasoning in LLMs, with a particular focus on argumentation, human–AI interaction, and evaluation frameworks. The successful candidate will design evaluation approaches, build interactive tools, and collaborate with experts from NLP, philosophy, and AI ethics.
More details and how to apply: https://jobs.unisg.ch/offene-stellen/research-assistant-doctoral-candidate-…
Best regards,
Christina Niklaus
Assistant Professor
Institute of Computer Science
University of St.Gallen
Hi all,
reminder: we are currently looking for an excellent colleague to fill a faculty position in Language Technology. This is a joint appointment between the Department of Language Science and Technology at Saarland University and the German Research Center for Artificial Intelligence (DFKI). It is a tenure-track position (W2 -> W3 on the German payscale) that is suitable for young researchers a couple of years after their PhD.
You will join one of the most active research sites in NLP and Informatics in Europe and, over time, grow into leading the Language Technology group at DFKI. Through DFKI, you will have access to an extensive global network of industry and other partners. This dual role offers a unique platform for high-impact research and meaningful societal engagement at a scale rarely achievable elsewhere.
Please find the full job ad under this link: https://tinyurl.com/saar-lt . The application deadline is September 26.
Best,
Alexander Koller.
Postdoc opportunity at the IT University of Copenhagen in an exciting project on improving citizens’ experience in emergency triage
https://candidate.hr-manager.net/ApplicationInit.aspx?cid=119&ProjectId=181…
Application deadline: 24 September 2025
We are looking to hire a postdoc in data science/natural language processing to contribute to an innovative research project aimed at streamlining citizens’ encounters with emergency medical services using AI technology. The project stands out by using public data for public good with access to one of the largest real-world healthcare dialogue datasets in Europe and by exploring novel approaches to human-AI alignment, where both citizens and AI systems adapt to each other through dynamic interaction.
As part of an interdisciplinary team, you will be a part of the NLPnorth natural language processing group at the IT University of Copenhagen, and work closely with human-computer interaction experts at the University of Copenhagen and the Emergency Medical Services of the Capital Region of Denmark. Your primary role in the project will involve:
* Research on exploiting large language models to facilitate effective communication with citizens in medical emergency scenarios, leading to academic publications in top-tier venues.
* Research on countering and mitigating LLM uncertainty and overconfidence to achieve human-AI alignment in emergency care.
* Deploying and evaluating automatic speech recognition models to transcribe emergency hotline calls in Danish with high accuracy.
* Implementing RAG-based LLM solutions for reliable and factual interaction with citizens in emergency situations.
The postdoc will be a part of the NLPnorth research group at the IT University of Copenhagen (https://nlpnorth.github.io/).
--
Christian Hardmeier
Associate Professor, IT University of Copenhagen
https://christianhardmeier.rax.ch/