Hello,
We are excited to invite you to submit late-breaking paper (up to two
pages) to IRAI 2026 - the First Workshop on Information Retrieval for
Accountability and Integrity, a half-day pilot workshop dedicated to
exploring how IR and NLP can help evaluate forward-looking statements,
verify commitments, and restore trust across public and private domains.
- What IRAI aims to do
Information systems shape public discourse, decisions, and trust-yet we
lack systematic ways to evaluate the accuracy of forward-looking
statements (e.g., campaign promises, corporate forecasts). Media
coverage is selective, standards are uneven, and the signal is buried in
noise. The result: accountability gaps and eroded confidence.
IRAI brings IR and NLP communities together to assess the fulfilment and
reliability of claims and commitments. It complements ECIR’s mission by
tackling a pressing, real-world challenge with societal impact.
- Aligned with the IR and NLP community
IRAI 2026 will be part of the European Conference on Information
Retrieval (ECIR) held in Delft on April, 2nd 2026 as it highlights
concrete applications for social good.
- Important Information (for Late-breaking paper)
* When: Apr 2, 2026
* Where: Delft, Netherlands
* Submission Deadline: Feb 25, 2026
* Notification Due: Mar 3, 2026
* Final Version Due: Mar 10, 2026
- More info and Registration:
https://nlpfin.github.io/sites/ECIR2026.html
--
IRAI organizers
1st Workshop on Creating Interoperable Corpora of Historical Newspapers (PressMint)
Final Call for Papers
Date: May 16, 2026, a half-day workshop
Location: Palma de Mallorca, Spain
Website: https://www.clarin.eu/PressMint-LREC2026
Submission Deadline: 1 March 2025
Submission link: https://softconf.com/lrec2026/PressMint/
Advertisement/Tagline
Unlock the pan-European history! Join the PressMint workshop to build & analyze multilingual, interoperable historical newspaper corpora!
Workshop description
Historical newspapers are of interest to historians and historical linguists, as well as to social and political scientists, ethnologists, anthropologists, media and communication scholars, and researchers in cultural studies. All of these are fields where contemporary digital resources, tools and methods (e.g. “distant reading”) are still underutilised. On the other hand, corpora of historical newspapers already exist for a number of languages and countries to a large extent, as they are out of copyright. Also, the images, and often OCR, are available through the national libraries. Also, in recent years these data started to be of big interest to the researchers since they preserve the historical, cultural, political, societal past. However, these corpora are not interoperable, which precludes methods for their comparison, as well as any translingual and transnational research, an especially important consideration, as statehood and nationhood are highly dynamic in Europe in the period to be covered by the project corpora. An initial joint attempt towards the creation of a corpus of historical newspapers from the beginning of 20. century on, is the CLARIN flagship project PressMint<https://www.clarin.eu/pressmint>. The project features data from 20 partners at the moment, aiming to develop a standard for interoperable resources of newspapers in diachronic timespans. The final goal is to provide structured and high quality multilingual data in a common format, with the same type of linguistic annotation that covers (at least partially) the same time period.
Objective
The PressMint workshop aims to gather experts interested in creating, processing and analyzing interoperable corpora of historical data in general, but especially with a focus on newspapers. Another very important objective is to consider also the perspective of the communities who use historical data - their purposes, requirements, feedback.
We encourage the interested colleagues to present their work on both types of levels – national and pan-European; monolingual and multilingual as well as task-specific and multidisciplinary. We view this workshop as a venue to exchange research ideas and start collaboration on this topic.
The workshop will feature one invited speaker: Maud Ehrmann, EPFL, CH
We invite unpublished original work focusing on (but not exclusive to) on the following topics:
*
compilation, annotation, visualisation and utilisation of historical newspaper corpora of the period relevant to PressMint (ideally around the start of the 20th century but not constrained by this period)
*
harmonisation of the existing multilingual historical newspaper corpora that contain either synchronic or diachronic data, or both
*
linking or comparing historical newspaper corpora with other datasets, including sources of structured knowledge, such as formal ontologies and LOD datasets
*
enrichment of historical newspaper corpora (with e.g. sentiment annotation, etc.)
*
machine translation of historical newspaper corpora
*
employment of LLMs as stand alone tools or as parts of NLP architectures for historical data processing, maintenance and knowledge deployment.
*
various scenarios of usage of historical data
Submission & Publication
We accept submission of long papers (from 6 to 8 pages), short papers (4 pages) and demo papers (4 pages) to be presented as a long or short oral presentation or poster presentations at the workshop. To support double-blind reviewing, all submissions must be fully anonymized and should be formatted according to the stylesheet available on the LREC 2026 website<https://lrec2026.info/authors-kit/>. The papers of the workshop will be published in online proceedings.
At the time of submission, authors are also offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map [https://lremap.elra.info/], which provides metadata for the resources.
Please note that the LREC style guide should be followed. The formatting guidelines can be found here: https://lrec2026.info/authors-kit/.
Important Dates
*
Paper submission deadline: 1 March 2026
*
Notification of acceptance: 15 March 2026
*
Camera-ready papers: 30 March 2026
*
Workshop date: 16 May 2026
Organizing Committee
*
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences, PL
*
Tanja Wissik, Austrian Academy of Sciences, AT
*
Petya Osenova, Sofia University ”St. Kl. Ohridski” & IICT-BAS, BG
The workshop is supported by the CLARIN research infrastructure and the PressMint Project.
To contact the organisers, please email maciej.ogrodniczuk(a)gmail.com<mailto:maciej.ogrodniczuk@gmail.com>
Dear Colleagues,
I am writing to ask whether you would be willing to distribute the following workshop announcement via your mailing list. Thank you.
Kind regards,
Maja (Stegenwallner-Schuetz)
-------
4TH WORKSHOP ON EYE MOVEMENTS AND THE ASSESSMENT OF READING COMPREHENSION
Dates: June 18-20, 2026
Location: University of Koblenz, Germany
Webpage: https://uni-ko.de/UTq3s
## Workshop theme:
Effective and widely available reading assessments are fundamental for educational and clinical settings, as they are instrumental for early diagnosis of reading difficulties, enabling timely and targeted intervention. In this workshop, we explore how eye-tracking combined with machine learning technologies can enhance reading assessments. Our goal is to bring together researchers from various relevant fields, including educational science, cognitive psychology, psycholinguistics, eye-tracking-based reading research, and machine learning. The workshop will provide a platform for exchanging ideas for the next generation of reading assessments aided by eye-tracking and machine learning technologies, as well as inspiring cross-disciplinary research collaborations.
We invite submissions on any topic related to the workshop's theme, including:
* Methods and practices of reading assessment in education (including large scale assessments) * Reading instruction and development * Reading impairments and learning difficulties * Machine reading comprehension * AI, NLP and ML modeling of human reading * Predictive modelling of language proficiency and reading processes underlying comprehension * HCI, human factors, and interactive tutoring systems * Eye tracking technologies * Cognitive models of eye movements in reading * Psycholinguistic analyses of reading * Text readability and simplification
Since the workshop aims to bring together researchers from different communities and addresses a nascent research area, we also welcome contributions that involve eye movements in reading (or alternative methodologies such as self-paced reading and mouse tracking) without directly addressing the assessment of reading comprehension. Likewise, we invite contributions on reading assessments that do not involve eye-tracking.
## Important dates:
* Abstract submission deadline: April 1, 2026 * Acceptance notifications: April 10, 2026 * Application deadline for travel stipends: April 23, 2026 * Registration period ends: May 15, 2026 * Workshop dates: June 18-20, 2026
## Workshop format:
The first two days will feature a structured program, including talks, poster sessions, and group discussions. On the third day, the focus shifts to a more relaxed format, providing participants with the opportunity to network and plan joint activities in an informal setting while enjoying a hike or a boat tour on the rivers.
## Invited speakers (confirmed):
Ido Roll
Faculty of Education in Science and Technology & Faculty of Data and Decision Sciences
Technion – Israel Institute of Technology, Israel
Sascha Schroeder
Institute of Psychology
University of Goettingen, Germany
## Submissions:
- We invite submissions of short abstracts of up to 350 words.
- To submit your abstract, please fill in the abstract submission form
at https://forms.gle/Nh7bREP59eyeYQGEA
- Submissions will be reviewed by the workshop organizers primarily with
an eye to relevance.
- We expect to accept 25–35 submissions.
## Venue:
The workshop will be held at the family owned Diehls Hotel which was established in 1919 and is beautifully located directly on the banks of the Rhine River. We booked two interconnected conference rooms Ehrenbreitenstein I and II which can accommodate up to 90 participants. Refreshments throughout the workshop will be provided by the hotel.
A contingent of hotel rooms has been reserved for workshop participants. The hotel also features a highly recommended restaurant, where we will enjoy our social dinner on Thursday evening.
In addition to the conference facilities, attendees will appreciate the hotel’s charming riverside setting. It is in close proximity to historic sites of the city.
The city of Koblenz is well connected to the city of Frankfurt via train. The venue is then accessible from Koblenz main station (Koblenz Hbf) by bus (16-25 Min depending on the bus route) or from the old city by taking a scenic ride across the Rhine river in the famous Koblenz Cable Car.
Contact information for booking and venue description with photos: https://diehls-hotel.de/en/rooms-suites/
## Funding
The workshop is funded by the MultiplEYE COST Action and the University of Koblenz. The workshop will provide financial support to cover travel expenses for a limited number of participants. Authors will be invited to apply for travel funding upon abstract acceptance. Funding may be partial, and priority will be given to junior researchers.
## Workshop organizers
Maja Stegenwallner-Schuetz
Dept. of Special Education
University of Koblenz, Germany
stegenwa(a)uni-koblenz.de
Lena Jaeger
Dept. of Computational Linguistics
University of Zurich, Switzerland
jaeger(a)cl.uzh.ch
Yevgeni Berzak
Faculty of Data and Decision Sciences
Technion, Israel
berzak(a)technion.ac.il
Titus von der Malsburg
Inst. of Linguistics
University of Stuttgart, Germany
titus.von-der-malsburg(a)ling.uni-stuttgart.de
--
JProf. Dr. Maja Stegenwallner-Schütz
Juniorprofessorin/Assistant professor
Förderpädagogik/Sprache unter besonderer Berücksichtigung inklusiver Bildungsprozesse/
Special Education/Language Development in Inclusive Learning Settings
Universität Koblenz
University of Koblenz
Institut für Förderpädagogik/Department of Special Education
Postfach 20 16 02 | D-56016 Koblenz (Postanschrift)
Universitätsstraße 1 | D-56070 Koblenz (Besucheranschrift)
Tel.: +49 261 287 1916
E-Mail: stegenwa(a)uni-koblenz.de
Website: https://www.uni-koblenz.de/de/bildungswissenschaften/institut-fuer-foerderp…
Datenschutz: https://www.uni-koblenz.de/de/datenschutz
Sixth Workshop on NLP for Indigenous Languages of the AmericasAmericasNLP
2026 will be co-located with ACL 2026 <https://2026.aclweb.org/> in San
Diego, California, USA!Call for PapersThe goal of AmericasNLP is to
encourage and increase the visibility of work on the Indigenous languages
of the Americas. It aims to encourage research on NLP, computational
linguistics, corpus linguistics and speech for Indigenous languages, to
connect researchers and professionals from underrepresented communities and
native speakers of endangered languages with the ACL community, and, more
generally, to promote machine learning approaches suitable for low-resource
languages. We invite the submission of:
- Long papers (8 pages) and short papers (4 pages) on substantial,
original, and unpublished research
- Non-archival extended abstracts (2 pages), technical reports (8
pages), and work which has been presented at other venues (in the format of
the original publication).
Submissions do not need to describe work on native languages directly, as
long as it is clear why those can benefit from the described approaches.
Areas of interest include but are not limited to:
- Creation of datasets for NLP applications
- Incorporation of external knowledge into neural systems
- Linguistic typology and the use of typological features for NLP
- Transfer learning, meta-learning, and active learning
- Weakly supervised, semi-supervised, and unsupervised learning
- Machine translation of low-resource languages
- Applications of, and innovation with LLMs for indigenous languages of
the Americas
- Morphology and phonology of low-resource languages
- NLP applications for Indigenous languages of the Americas
- Ethical considerations for research on languages spoken by Indigenous
communities
- Language activism, revitalization, and sovereignty, in the context of
NLP models and research
Submissions will be accepted until April 15th, 2026 via softconf: submission
portal <https://softconf.com/acl2026/americasnlp>
*Note:* Limitation section and ethics statement are not mandatory, but
strongly encouraged. If they are part of your submission, they do *not* count
towards the page limit.Shared TaskTo motivate the NLP community to increase
research efforts on Indigenous and endangered languages, AmericasNLP 2026
will feature a new shared task about image captioning of culturally
relevant images. The results of the shared task will be presented during
the in-person workshop in San Diego. More information can be found here
<https://turing.iimas.unam.mx/americasnlp/2026_st.html>.Important Dates
- Submission Deadline: April 15th *(After the ACL acceptance
notification)*
- Notification of Acceptance: May 10th
- Camera-Ready Papers Due: May 22nd
- Workshop: July 3 or 4
All deadlines are 11:59pm anywhere on Earth (AoE).Organizing Committee
- *Manuel Mager*, Johannes Gutenberg University of Mainz,
jmagerho(a)uni-mainz.de
- *Arturo Oncevay*, Independent, arturo.oncevay(a)gmail.com
- *Abteen Ebrahimi*, University of Colorado Boulder,
abteen.ebrahimi(a)colorado.edu
- *Minh Duc Bui*, Johannes Gutenberg University of Mainz,
minhducbui(a)uni-mainz.de
- *Shruti Rijhwani*, Google DeepMind, shrutirijhwani(a)google.com
- *Luis Chiruzzo*, Universidad de la República, Uruguay,
luischir(a)fing.edu.uy
- *Robert Pugh*, University of Indiana, pughrob(a)iu.edu
- *Rolando Coto-Solano*, Dartmouth College,
rolando.a.coto.solano(a)dartmouth.edu
- *John E. Ortega*, Northeastern University, j.ortega(a)northeastern.edu
- *Katharina von der Wense*, University of Colorado Boulder and Johannes
Gutenberg University of Mainz, katharina.kann(a)colorado.edu
ContactContact: americas.nlp.workshop(a)gmail.com
Website: https://turing.iimas.unam.mx/americasnlp/
Please note that the deadline for submissions has now been extended to
Friday 27 February 2026.
CALL FOR PAPERS
The Second Workshop on Holocaust Testimonies as Language Resources
(HTRes-2026), pre-conference workshop W53 at LREC2026
Date: 11 May 2026 (afternoon)
Location: Palma de Mallorca, Spain
Workshop web page: https://www.clarin.eu/HTRes2026
Submission Deadline: 27 February 2026
Submission link: https://softconf.com/lrec2026/HTRes2026/
Holocaust testimonies serve as a bridge between survivors and history’s
darkest chapters, providing a connection to the profound experiences of
the past. Testimonies stand as the primary source of information that
describes the Holocaust, offering first-hand accounts and personal
narratives of those who experienced it. The majority of testimonies are
captured in an oral format, as survivors vividly explain and share their
personal experiences and observations from that time period.
Transforming Holocaust testimonies into a machine-processable digital
format can be a difficult task owing to the unstructured nature of the
text. The creation of accessible, comprehensive, and well-annotated
Holocaust testimony collections is of paramount importance to our
society. These collections empower researchers and historians to
validate the accuracy of socially and historically significant
information, enabling them to share critical insights and trends derived
from these data.
The primary objective of this workshop is to explore how various
theories, techniques, and tools from corpus linguistics, natural
language processing, and digital humanities can contribute to the
examination, analysis, dissemination, and preservation of Holocaust
testimonies and other Holocaust-related documents.
The workshop is supported by CLARIN and EHRI.
Please find full details of the call for papers at the workshop web page
at https://www.clarin.eu/HTRes2026. The main conference website is at
https://lrec2026.info/ .
IMPORTANT DATES
Final date for paper submission: extended to 27 February 2026
Notification of Acceptance: 11 March 2026
Camera-ready version submission: 30 March 2026
Workshop date: 11 May 2026
To contact the organisers, please email holocausttlr(a)gmail.com
From Martin Wynne on behalf of the organizing committee.
--
Senior Researcher in Corpus Linguistics
Faculty of Linguistics, Philology and Phonetics, University of Oxford
National Co-ordinator, CLARIN-UK
martin.wynne(a)ling-phil.ox.ac.uk
https://orcid.org/0000-0002-4155-0530
--
Senior Researcher in Corpus Linguistics
Faculty of Linguistics, Philology and Phonetics, University of Oxford
National Co-ordinator, CLARIN-UK
martin.wynne(a)ling-phil.ox.ac.uk
https://orcid.org/0000-0002-4155-0530
Final Call for Papers
LANLP: Bridging Ibero and Latin American NLP communities
16 May 2026, Palma de Mallorca, Spain
http:<http://lanlp>https://sites.google.com/view/lanlp2026/home
Co-located Networking Symposium @ LREC 2026
https://lrec2026.info/
Description and Goals
We organise a Networking Symposium on Latin American NLP (LANLP), focusing on natural language processing for the diverse languages of the Iberian Peninsula and Latin America. This region includes major world languages (e.g. Spanish (~558M speakers), Portuguese (~267M) as well as regional and indigenous languages. For example, Latin America alone hosts tens of millions of speakers of Quechua (~10M), Guaraní (>6M), Nahuatl (~2M), Aymara (~2M), among many others. Such languages are highly under‐resourced: over 88% of the world’s languages remain largely unsupported by language technologies. This networking event addresses that gap by promoting collaboration on ethically and culturally sensitive resource creation, evaluation, and novel methods for low-resource multilingual NLP in Iberian and Latin American languages and varieties. Our goal is to bring together communities (SEPLN<http://www.sepln.org/>, CLARIAH-ES<https://www.clariah.es/>, PROPOR<https://propor2024.citius.gal/>, AmericasNLP<https://turing.iimas.unam.mx/americasnlp/index.html>, and SomosNLP<https://somosnlp.org/>) to share cutting-edge research, language resources, and best practices.
LANLP focuses on community-driven resource development and evaluation for Iberian languages, and diverse Latin American languages (including indigenous and minority languages). We aim to bridge regional communities: for instance, past forums like OpenCor note that “Latin American and Iberian communities... did not have an established event” to share initiatives, corpora and tools. LANLP fills this gap, fostering new contacts between Iberian and Latin American NLP research groups. The goals are to (1) highlight challenges in processing these languages, (2) share novel datasets and models, and (3) catalyze future collaborations and shared tasks. We emphasize both academic rigor and community inclusivity, encouraging contributions from established researchers and grassroots language advocates alike.
Topics of Interest
We invite submissions on topics including (but not limited to):
*
Language resource creation: Corpora, lexicons, and annotations for Iberian and Latin American languages (text, speech, multimodal).
*
LLMs opportunities and challenges: Small Language Models, synthetic data, mitigating biases, linguistic inequalities, data scarcity, language domination.
*
Multilingual transfer & modeling: Cross-lingual and multilingual representations, transfer learning, and embedding methods that bridge Spanish, Portuguese, varieties and minority languages.
*
Machine translation & generation: MT, summarization, and language generation for Spanish, Portuguese, and low-resource languages (e.g., Quechua, Aymara, Nahuatl).
*
Speech and audio processing: ASR, TTS, and spoken language resources for under-resourced languages and regional dialects (e.g. indigenous languages, Brazilian Portuguese, Latin American Spanish).
*
Dialectal and code-switching NLP: Identification and handling of dialectal variation and code-switching (e.g. Spanish–Portuguese code-mixing, Spanish–indigenous language contact).
*
Morphology and syntax: Analysis and tagging for morphologically rich or under-documented languages (e.g. Basque, Mapudungun, Bribri) using universal dependencies or other frameworks.
*
Domain-specific NLP: Social media, sentiment, hate-speech detection, and other tasks in Iberian and Latin American language contexts (e.g. Latin American social media analysis).
*
Digital humanities & cultural heritage: NLP for historical texts, literature, and cultural content in Spanish, Portuguese, and regional languages.
*
Community-driven methods: Crowdsourcing, citizen science, and participatory approaches for data collection and annotation in these languages.
*
Evaluation and benchmarks: Development of evaluation metrics and benchmarks tailored to low-resource Iberian/Latin languages.
*
Ethical and social issues: Fairness, bias, and indigenous language rights in NLP; collaboration with native speaker communities; data governance and sustainability of resources.
Important dates
*
February 18, 27, 2026: Paper submission deadline *extended*
*
March 20, 2026 Notification of acceptance
*
March 30, 2026: Camera-ready deadline
*
May 16, 2026: Networking Symposium Date
Submission Instructions
We invite non anonymous submissions in English, Spanish or Portuguese on the topics of interest between 4 and 8 pages of content. The page limit of 8 pages does not include acknowledgements, references, potential Ethics Statements and discussion on Limitations in line with the policy of the main LREC conference. All submissions must follow the LREC stylesheet (https://lrec2026.info/authors-kit/).
Any submissions which are over-length, poorly formatted or make excessive use of appendices to circumvent page limits are liable to desk-rejection.
At the time of submission, authors are offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map (https://lremap.elra.info/), which provides metadata for the resource.
Organizing Committee
*
Luis Chiruzzo Inco (AmericasNLP, luischir(a)fing.edu.uy<mailto:luischir@fing.edu.uy>)
*
Pablo Gamallo (PROPOR, CiTIUS, pablo.gamallo(a)usc.gal<mailto:pablo.gamallo@usc.gal>)
*
María Grandury (SomosNLP, EPFL, mariagrandury(a)gmail.com<mailto:mariagrandury@gmail.com>)
*
Rafael Muñoz Guillena (SEPLN, CENID, UA, rafael(a)dlsi.ua.es<mailto:rafael@dlsi.ua.es>)
*
German Rigau Claramunt (CLARIAH-ES. HiTZ Center, EHU, german.rigau(a)ehu.eus<mailto:german.rigau@ehu.eus>)
Hello,
The first annual Oxford Test of English Learner Corpora (OTELC) Research Competition, hosted by Oxford University Press, is now open.
This competition offers master’s students in linguistics, corpus linguistics, or language assessment the opportunity to design a research project using authentic English‑language test‑taker responses from the Oxford Test of English Learner Corpora. Selected entrants will receive full access to the OTELC for the duration of the competition. The winning submission will be awarded a 13‑inch iPad Air and the opportunity to have their work published on the Oxford English Assessment Research webpage.
Eligibility requirements
Applicants must:
- Be enrolled in a master’s programme
- Be taking a course in linguistics, corpus linguistics, or language assessment
- Have at least one semester remaining in their programme
How to apply
Applicants should submit a research proposal using the official application form. Proposals must clearly outline research aims, research questions, and how the OTELC will be used to address them.
Application deadline: Sunday, 31 May 2026
Further information:
https://elt.oup.com/feature/global/learner-corpora/
Best wishes,
Colin Finnerty
Head of Assessment Research
Oxford English Assessment Research
Oxford University Press
3rd International Workshop on Natural Scientific Language Processing (NSLP 2026):
Final Call for Papers
12 May 2026 – Co-located with LREC 2026
Palma, Mallorca (Spain)
NSLP 2026 features two shared tasks:
* ClimateCheck 2026: Scientific Fact-Checking of Social Media Claims
* SOMD 2026: Software Mention Detection & Coreference Resolution
NSLP 2026 – important dates:
* Submission deadline: 20 February 2026
* Notifications: 13 March 2026
* Camera-ready: 30 March 2026
NSLP 2026 website (including the shared tasks):
* https://nfdi4ds.github.io/nslp2026
Scientific research has witnessed a steep growth rate over the last decades. The number of scholarly publications is growing exponentially, and doubles every 15-17 years. Consequently, both general and specialised repositories, databases, knowledge graphs, and digital libraries have been developed to publish and manage scientific artifacts. Examples include the Open Research Knowledge Graph (ORKG), the Semantic Scholar Academic Graph (S2AG), PubMed Central and also the ACL Anthology. These resources enable the collection, reuse, tracking, and expansion of scientific findings, and facilitate downstream applications such as scientific search engines.
However, in order to develop robust systems that deal with scholarly text, various challenges need to be addressed. The current status quo of scientific communication mostly includes scholarly articles as unstructured PDF documents, which are not machine-readable in the sense that relevant scientific information can be extracted easily, thus making extracting and utilising this information as part of the scientific process a laborious and time-consuming task. Developing methods for converting unstructured information into structured formats is one of the major challenges in the field of Natural Scientific Language Processing (NSLP). This goal encompasses related challenges such as detecting, disambiguating, and linking mentions of scientific artifacts (e.g., software tools or specific datasets or language resources), and tracking state-of-the-art models and their evaluation scores (including new versions of existing models). Extracting and managing heterogeneous scientific knowledge effectively remains a challenging ongoing research area. Existing efforts are often fragmented, addressing separate issues with distinct datasets and conceptual approaches.
NSLP 2026 addresses current topics and issues in Natural Scientific Language Processing. It is proposed and organised with the support of NFDI for Data Science and Artificial Intelligence (NFDI4DS), a long-term project with approx. 20 partners who work towards building a German national research data infrastructure for DS and AI. The workshop aims to further bring together the international community of researchers who work on NSLP and related topics (including research knowledge graphs), to discuss current issues and possible solutions. NSLP 2026 includes two keynote speakers and presentations of accepted papers (oral and poster presentations), as well as three shared tasks.
Topics of interest include, but are not limited to
* Scientific LLMs – LLMs for NSLP
* Language resources (LRs) and Language technologies (LTs) for NSLP beyond LLMs
* Research Knowledge Graphs (RKGs), Scientific Knowledge Graphs (SKGs) and other forms of structured representation of research-related knowledge
* Information extraction from scholarly articles
* Extraction of research information from texts
* Detection and disambiguation of mentions of datasets, tasks, software or other methods
* Classification of scholarly articles (collections, single documents, parts of documents)
* Information extraction for RKGs
* Summarisation of scholarly articles
* Scholarly IR and scientific search engines
* Question answering over scientific knowledge
* Metadata and cataloging
* Cross-lingual and multilingual natural scientific language processing
* Adaptation of NLP methods for NSLP purposes
Important Dates
* Paper submission deadline: 20 February 2026 (not to be extended)
* Notification of acceptance: 13 March 2026
* Camera-ready submission: 30 March 2026
* Workshop: 12 May 2026
Submission Guidelines
The NSLP 2026 workshop invites submissions of: regular long papers; short papers; position papers. We especially encourage submissions from junior researchers and students from diverse backgrounds.
* Note that we will not accept work that is under review or has already been published in or accepted for publication in a journal, another conference, or another workshop.
* The workshop invites anonymous submissions of regular long papers (up to 8 pages without references and appendix); short papers as well as position papers (up to 4 pages without references and appendix) presenting, for example, negative results, in-progress projects, or demos.
* Authors are permitted to include an optional appendix of up to 2 pages. However, reviewers will not be mandated to review the appendix and all papers must be self-contained.
* Reviewing will be performed double-blind, i.e., submissions must be anonymous. Reviewers will not actively try to identify the authors.
* Submissions must be in PDF, formatted in the LREC 2026 style.
* The proceedings of this workshop will be published in the ACL Anthology (full Open Access) as part of the LREC 2026 proceedings.
* At least one author per contribution must register for the workshop for presentation.
* All submissions are done via START: https://softconf.com/lrec2026/NSLP2026/
When submitting a paper through START, the authors will be asked to provide essential information about resources (in a broad sense, i.e., also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).
Keynote Speakers
* Iryna Gurevych, TU Darmstadt, Germany
* Yufang Hou, ITU Austria, Austria
Shared Tasks
1. ClimateCheck 2026: Scientific Fact-Checking of Social Media Claims
The rise of climate discourse on social media offers new channels for public engagement but also amplifies mis- and disinformation. As online platforms increasingly shape public understanding of science, tools that ground claims in trustworthy, peer-reviewed evidence are necessary. The new iteration of ClimateCheck builds on the results and insights from the 2025 iteration (run at SDP 2025/ACL 2025), offering the following subtasks:
Subtask 1: Abstract retrieval and claim verification: given a claim and corpus of publications, retrieve the top 10 most relevant abstracts and classify each claim-abstract pair as supports, refutes, or not enough information.
Subtask 2: Disinformation narrative classification: given a claim, predict which climate disinformation narrative exists according to a predefined taxonomy.
New training data will be released for both tasks, with task 1 having triple the amount of the last iteration. The new iteration will focus on sustainability, emphasising the need to build climate-friendly NLP systems with minimal environmental impact.
Shared task co-organisers: Raia Abu Ahmad, Aida Usmanova, Max Upravitelev, Georg Rehm
2. SOMD 2026: Software Mention Detection & Coreference Resolution
Understanding software mentions is crucial for reproducibility and to interpret experimental results. Citations of software are often informal, lacking the use of persistent identifiers, making it hard to infer and disambiguate knowledge about software efficiently. This task will build on SOMD 2025 (run at SDP 2025, co-located with ACL 2025) and focus on entity disambiguation as an under-investigated problem in this context. More precisely, we address the task of coreference resolution of software mentions across multiple documents, i.e. given a set of software mentions extracted from multiple scientific publications, cluster these mentions so that all software mentions in a particular cluster refer to the same real world software. We define three subtasks with varying challenges:
Subtask 1: Software coreference resolution over gold standard mentions. Addresses the task based on high-quality (gold standard) mentions of software that are expert-annotated in multiple publications.
Subtask 2: Software coreference resolution over predicted mentions. Addresses the task on software mentions that are automatically extracted using a baseline model, i.e. reflecting a typical information extraction scenario, where upstream pipelines (such as entity and metadata extraction) are imperfect.
Subtask 3: Software coreference resolution at scale. Addresses the task using predicted mentions of software and metadata at a larger scale. This challenges models to scale effectively, maintain accuracy, and distinguish among an increasingly dense field of similar or overlapping software mentions.
Shared task co-organisers: Sharmila Upadhyaya, Stefan Dietze, Frank Krüger, Wolfgang Otto
Organisers
* Georg Rehm (Deutsches Forschungszentrum für Künstliche Intelligenz & Humboldt-Universität zu Berlin, Germany) – main contact: <georg.rehm(a)dfki.de<mailto:georg.rehm@dfki.de>>
* Stefan Dietze (GESIS Leibniz Institut für Sozialwissenschaften, Cologne & Heinrich-Heine-University Düsseldorf, Germany)
* Danilo Dessí (University of Sharjah, UAE)
* Diana Maynard (University of Sheffield, UK)
* Sonja Schimmler (Technical University of Berlin & Fraunhofer FOKUS, Germany)
Programme Committee
* Marcel Ackermann, Lernzentrum Informatik (LZI), DBLP, Germany
* Raia Abu Ahmad, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
* Tilahun Abedissa Taffa, University of Hamburg, Germany
* Ekaterina Borisova, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
* Davide Buscaldi, LIPN, CNRS, University Paris 13, France
* Leyla Jael Castro, ZB MED Information Centre for Life Sciences, Germany
* Mathieu d’Aquin, Université de Lorraine, France
* Jennifer D’Souza, TIB Leibniz Information Centre for Science and Technology, Germany
* Catherine Faron, Université Côte d’Azur, France
* Dayne Freitag, SRI International, USA
* Paul Groth, University of Amsterdam, TheNetherlands
* Leonhard Hennig, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
* Inma Hernandez, University of Seville, Spain
* Robert Jäschke, Humboldt University of Berlin, Germany
* Petr Knoth, Open University, UK
* Frank Krüger, Wismar University of Applied Sciences, Germany
* Julia Lane, NYU Wagner Graduate School of Public Service, USA
* Andrea Mannocci, CNR-ISTI, Italy
* Natalia Manola, OpenAIRE, Greece
* Mirko Marras, University of Cagliari, Italy
* Philipp Mayr-Schlegel, GESIS Leibniz-Institute for the Social Sciences, Germany
* Pedro Ortiz Suarez, Common Crawl Foundation, USA
* Wolfgang Otto, GESIS Leibniz-Institute for the Social Sciences, Germany
* Haris Papageorgiou, R.C. Athena, Greece
* Silvio Peroni, University of Bologna, Italy
* Simone Ponzetto, Univ. of Mannheim, Germany
* Diego Reforgiato Recupero, University of Cagliari, Italy
* Harald Sack, FIZ Karlsruhe, Germany
* Angelo Salatino, The Open University, UK
* Philipp Schaer, TH Köln (University of Applied Sciences), Germany
* Atsuhiro Takasu, University of Tokyo, Japan
* Stefani Tsaneva, WU Wien, Austria
* Ricardo Usbeck, Leuphana University, Germany
* Thanasis Vergoulis, R.C. Athena, Greece
*** 2026 NARNiHS Research Incubator
*** North American Research Network in Historical Sociolinguistics
*** 8th edition
*** 07-09 May 2026 -- entirely online!
==> Abstract Submission Deadline
==> 23 March 2026, 11:59 PM (U.S. Eastern Time)
The 2026 NARNiHS Research Incubator is an entirely online event (**with free registration**). This event offers an opportunity for scholars in historical sociolinguistics from all over the world to participate in discussions of cutting-edge research without the limitations imposed by international travel. We encourage our fellow historical sociolinguists and scholars from related fields in our global scholarly community to join us online for our Research Incubator this spring.
Abstract submission deadline: 23 March 2026, 11:59 PM (U.S. Eastern Time)
Abstract submission online: https://easyabs.linguistlist.org/submit/2026_Incubator/
The North American Research Network in Historical Sociolinguistics (NARNiHS) is accepting abstracts for its 2026 NARNiHS Research Incubator. The 8th edition of this inclusive NARNiHS event seeks to provide a collaborative environment where presenters bring work that is in-progress, exploratory, proof-of-concept, prototyping. The Incubator's audience actively participates in workshopping these new ideas, brainstorming along with the presenters to forge scholarly paths and develop research solutions. We see the NARNiHS Research Incubator as a place for testing and pushing boundaries; developing new theories, methods, models, and tools in historical sociolinguistics; seeking feedback from peers; and engaging in productive assessment of fledgling ideas and nascent projects.
NARNiHS welcomes papers in all areas of historical sociolinguistics, which is understood as the application/development of sociolinguistic theories, methods, and models for the study of historical language variation and change over time, or more broadly, the study of the interaction of language and society in historical periods and from historical perspectives. Thus, a wide range of linguistic areas, subdisciplines, and methodologies easily find their place within the field, and we encourage submission of abstracts that reflect this broad scope.
Successful abstracts for this research incubator environment will demonstrate thorough grounding in historical sociolinguistics, scientific rigor in the formulation of research questions, and promise for rich discussion of ideas. Abstracts should be explicit about which theoretical frameworks, methodological protocols, and analytical strategies are being applied or critiqued. Data sources and examples should be sufficiently (if briefly) presented, so as to allow reviewers a full understanding of the scope and claims of the research. Please note that the connection of your research to the field of historical sociolinguistics should be explicitly outlined in your abstract. Abstracts should not exceed one page (not including examples and references, see below). Failure to adhere to these criteria will likely result in rejection.
We are soliciting abstracts for 25-minute presentations. Presenters will have the entire 25 minutes for their presentations, with discussion happening in the "incubation session" at the end of each panel. Presentations will be grouped into thematic panels of three presentations, each panel followed by an hour-long discussion with the audience led by specialists, creating a brainstorming/workshopping environment that encourages maximum exchange of ideas. Discussion will encompass specific feedback on the individual papers as well as consideration of overarching questions of theory, methods, and models emerging from the papers. To facilitate such discussion, authors will be required to submit a draft of their presentation materials for distribution to the panel discussants and to the other presenters a few days prior to the start of the conference.
Abstracts will be accepted until Monday, 23 March 2026 -- late abstracts will not be considered.
*** Abstract Content Requirements:
1) Abstracts should be explicit about which theoretical frameworks, methodological protocols, and analytical strategies are being applied or critiqued.
2) Data sources and examples should be sufficiently (if briefly) presented, so as to allow reviewers a full understanding of the scope and claims of the research.
3) The connection of your research to the field of historical sociolinguistics should be explicitly outlined.
*** Abstract Format Guidelines:
1) Abstracts must be submitted in PDF format.
2) Abstracts must fit on one standard 8.5×11 inch or A4 page, with margins no smaller than 1 inch / 2.5 cm and a font style and size no smaller than Times New Roman 12-point. All additional content (visualizations, trees, tables, figures, captions, examples, and references) must fit on a single (1) additional page. No exceptions to these requirements are allowed; abstracts exceeding these limits will be rejected without review.
3) Anonymize your abstract. We realize that sometimes complete anonymity is not attainable, but there is a difference between the nature of the research creating an inability to anonymize and careless non-anonymizing (in citations, references, file names, etc.). Be sure to anonymize your PDF file (you may do so in Adobe Acrobat Reader by clicking on "File", then "Properties", removing your name if it appears in the "Author" line of the "Description" tab, and re-saving the file before submission). Do not use your name when saving your PDF (e.g. Smith_Abstract.pdf); file names will not be automatically anonymized by the EasyAbs system. Rather, use non-identifying information in your file name (e.g. HistSoc4Lyfe.pdf). Your name should only appear in the online form accompanying your abstract submission. Papers that are not sufficiently anonymized wherever possible will be rejected without review.
*** General Conference Requirements:
1) Abstracts must be submitted electronically, using the following link: https://easyabs.linguistlist.org/submit/2026_Incubator/
2) Papers must be delivered as projected in the abstract or represent bona fide developments of the same research.
3) Authors are expected to virtually attend the conference and present their own papers.
4) Presentations will be delivered via Zoom. Technical details and instructions regarding the platform will be sent to authors in due time.
Please contact us at NARNiHistSoc(a)gmail.com with any questions.
In this newsletter:
LDC membership discounts expire March 2
Spring 2026 data scholarship recipient
New publications:
2022 NIST Language Recognition Evaluation Test and Development Sets<https://catalog.ldc.upenn.edu/LDC2026S03>
KAIROS Schema Learning Background Source Data<https://catalog.ldc.upenn.edu/LDC2026T02>
LORELEI Russian Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2026T01>
________________________________
LDC membership discounts expire March 2
Time is running out to save on 2026 membership fees. Renew your LDC membership, rejoin the Consortium, or become a new member by March 2 to receive a 10% discount. For more information on membership benefits and options, visit Join LDC<https://www.ldc.upenn.edu/members/join-ldc>.
Spring 2026 data scholarship recipient
Congratulations to the recipient of LDC's Spring 2026 data scholarship:
Doma Akshitha Reddy: Chaitanya Bharathi Institute of Technology (India): Bachelor of Engineering, Information Technology. Doma is awarded copies of TIMIT Acoustic-Phonetic Continuous Speech Corpus and The CMU Kids Corpus for their work in child speech.
Since 2010, LDC has awarded scholarships to successful student applicants twice each year. To date more than 242 corpora have been distributed to 162 students across 38 countries. We proudly celebrate their achievements and the contributions their research has made to the broader community.
The next round of applications will be accepted in September 2026. For information about the program, visit the Data Scholarships page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.
________________________________
New publications:
2022 NIST Language Recognition Evaluation Test and Development Sets<https://catalog.ldc.upenn.edu/LDC2026S03> was developed by LDC and NIST<https://www.nist.gov/> and contains the test and development data, metadata, answer keys, and documentation for the 2022 NIST Language Recognition Evaluation (LRE22). The source data is comprised of 222 hours of conversational telephone speech (CTS) and broadcast narrowband speech (BNBS) in 14 languages: Afrikaans, Tunisian Arabic, Algerian Arabic, Libyan Arabic, South African English, Indian-accented South African English, North African French, Ndebele, Oromo, Tigrinya, Tsonga, Venda, Xhosa, and Zulu.
For the CTS collections, a small number of native speakers made single calls to multiple individuals in their social network. Calls lasted 8-15 minutes; speakers were free to discuss any topic. The BNBS data was collected from streaming radio programming, focused on broadcasts that included narrowband speech (e.g., call-ins to a talk show). Portions of the CTS callee call sides and portions of each broadcast recording were manually audited by native speakers to verify language and quality.
LRE22 <https://www.nist.gov/publications/2022-nist-language-recognition-evaluation> emphasized language recognition for African languages, including low resource languages, and expanded the range of test segment durations. Further information about the 2022 evaluation can be found in the 2022 NIST Language Recognition Evaluation Plan. <https://lre.nist.gov/uassets/3>
2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
KAIROS Schema Learning Background Source Data<https://catalog.ldc.upenn.edu/LDC2026T02> was developed by LDC and includes 14,000 English and Spanish documents representing text, audio, video, image, and multimedia resources collected during the DARPA KAIROS program as supplemental background source data for the KAIROS Schema Learning Corpus (SLC). The purpose of the supplemental collection was to increase the amount of English and Spanish data with multimedia components for schema learning and to add domains not well represented in existing Spanish data. The supplemental data in this release includes material from the business and logistics domains, instructional documents and multimedia news.
The complete set of SLC background source data (including the data in this publication) totaled 16.2 million English, Russian, and Spanish documents and more than 125,000 audio, video, image, or multimedia resources. A large portion of that data was drawn from pre-existing LDC datasets.
The SLC and KAIROS Schema Learning Complex Event Annotation (LDC2025T07)<https://catalog.ldc.upenn.edu/LDC2025T07>, containing English and Spanish text, audio, video, and image material labeled for 93 real-world complex events, constitute the data used by KAIROS system developers for schema learning.
KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions, and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus.
2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Russian Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2026T01> contains over 1.26 billion words of Russian monolingual text, 360,00 words of which were translated into English, 3 million words of found Russian-English parallel text, and 87,000 Russian words translated from English data. Approximately 83,000 words were annotated for simple named entities, around 26,000 words were annotated for full entity (including nominals and pronouns), entity linking and situation frames (identifying entities, needs, and issues) and nearly 9,000 words were covered by noun phrase chunking annotation. Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104