In this newsletter:
Renew your LDC membership today
New publications:
CALLHOME Japanese Second Edition<https://catalog.ldc.upenn.edu/LDC2026S02>
CALLHOME Japanese Lexicon Second Edition<https://catalog.ldc.upenn.edu/LDC2026L01>
MATERIAL Swahili-English Language Pack<https://catalog.ldc.upenn.edu/LDC2026S01>
________________________________
Renew your LDC membership today
The importance of curated resources for language-related education, research, and technology development drives LDC's mission to create them, to accept data contributions from researchers across the globe, and to broadly share such resources through the LDC Catalog. LDC members enjoy no-cost access to new corpora released annually, as well as the ability to license legacy data sets from among our 1000 holdings at reduced fees. Ensure that your data needs continue to be met by renewing your LDC membership or by joining the Consortium today.
Now through March 2, 2026, any organization that joins the Consortium or renews their membership will receive a 10% discount off the 2026 membership fee. Membership remains the most economical way to access current and past LDC releases. Consult Join LDC<https://www.ldc.upenn.edu/members/join-ldc> for more details on membership options and benefits.
________________________________
New publications:
CALLHOME Japanese Second Edition<https://catalog.ldc.upenn.edu/LDC2026S02> was developed by LDC and contains 49 hours of speech from 120 telephone conversations between native Japanese speakers. This publication is a re-release of the original CALLHOME Japanese collection, combining CALLHOME Japanese Speech (LDC96S37)<https://catalog.ldc.upenn.edu/LDC96S37> and CALLHOME Japanese Transcripts (LDC96T18)<https://catalog.ldc.upenn.edu/LDC96T18> with additional transcription and updated directory structure, file formats, and documentation.
This corpus contains the 120 calls from CALLHOME Japanese Speech which represented training and development data and a subset of evaluation data. Participants spoke on topics of their choice in a single telephone call lasting up to 30 minutes. Calls were manually audited for language, recording quality, channel characteristics, dialect, and region. For this second edition, all audio was converted from SPHERE files to FLAC format, and the original training/development/test partitioning was removed.
This release also features revised transcripts conforming to updated LDC transcription guidelines that addressed normalization of annotation formats, standardization of speaker-produced and background noises, application of foreign-language marking, whitespace cleanup, and corrections and consistency fixes.
The CALLHOME series consists of telephone conversations and transcripts developed by LDC and Rutgers, The State University of New Jersey, in support of research in speaker identification, language identification, and related technologies. Languages in the series include American English, Egyptian Arabic, German, Japanese, Mandarin Chinese, and Spanish.
2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
CALLHOME Japanese Lexicon Second Edition<https://catalog.ldc.upenn.edu/LDC2026L01> was developed by LDC and contains 80,688 Japanese words with morphological, phonological, and stress information. This second edition updates file formats, directory structure, and documentation. The first edition is available as CALLHOME Japanese Lexicon (LDC96L17)<https://catalog.ldc.upenn.edu/LDC96L17>. The words in the lexicon were derived from 80 transcripts representing telephone conversations between native Japanese speakers contained in CALLHOME Japanese Second Edition (LDC2026S02)<https://catalog.ldc.upenn.edu/LDC2026S02>.
The lexicon contains seven tab-separated information fields: (1) headword: orthographic form in kanji or katakana or hiragana (if only written in hiragana); (2) hiragana: orthographic form in hiragana; (3) romanization: orthographic form in romaji; (4) pron: pronunciation of the headword; (5) morph: morphological analysis of the headword; (6) train freq: frequency of the headword in the transcripts; and (7) gloss: glosses of the headword. This release also includes a pronunciation dictionary derived from the lexicon in CMUdict<https://stdlib.io/docs/api/latest/@stdlib/datasets/cmudict> format and the grapheme-to-phoneme (G2P) tools used to automatically generate pronunciations for the original lexicon.
2026 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
*
MATERIAL Swahili-English Language Pack<https://catalog.ldc.upenn.edu/LDC2026S01> was developed by Appen<http://www.appen.com/> for the IARPA MATERIAL<https://www.iarpa.gov/index.php/research-programs/material> program and contains 112 hours of Swahili conversational telephone speech, transcripts, English translations, annotations, and queries. Calls were made using different telephones (e.g., mobile, landline) from a variety of environments. Transcripts cover approximately 30% of the speech files, 3% of which were translated into English. This release also includes domain annotations, English queries, and their relevance annotations.
The MATERIAL program focused on underserved languages with the ultimate goal to build cross language information retrieval systems to find speech and text content using English search queries.
2026 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
========== First Call of Papers: SwissText 2026 ==========
Paper submission due: 23:59 AOE March 17, 2026
Conference date: June 10, 2026 in Zurich, Switzerland
Conference website: https://www.swisstext.org/
================================================
Dear colleagues,
We are pleased to announce the Call for Papers for SwissText 2026, the 11th edition of the Swiss Text Analytics Conference.
SwissText 2026 will take place on June 10, 2026, at the University of Zurich (Campus Oerlikon) in Zurich, Switzerland. SwissText is an established international forum for researchers and practitioners working on natural language processing, computational linguistics, and text analytics, with a strong tradition of fostering exchange between academia and industry.
We invite submissions of substantial, original, and unpublished work to the following tracks:
* Applied Track (non-archival), with a strong focus on industry and applied research.
* Scientific Track (archival), with technical research papers from the international scientific community, including corpus- and benchmark-related research papers with a focus on Swiss languages from the scientific community and industry.
* Corpus Track (archival), with Swiss-related NLP datasets.
* Demonstration Track (non-archival), with NLP systems presented live at the SwissText conference.
The special theme of SwissText 2026 is Reproducible NLP, we therefore encourage submissions working specifically in reproducible NLP research and fully open NLP.
We plan to publish the proceedings of SwissText 2026 in the ACL Anthology.
Important dates
* Submission deadline: March 17, 2026
* Notification of acceptance: April 21, 2026
* Camera-ready deadline: May 5, 2026
* Conference date: June 10, 2026
All deadlines are at 11:59PM UTC-12:00 AOE (“anywhere on Earth”).
Detailed submission guidelines and formatting instructions can be found on the conference website: https://www.swisstext.org/call-for-papers/
General Chair: Prof. Dr. Rico Sennrich, University of Zurich
Organizing Committee: Jannis Vamvas, Yingqiang Gao, Tilia Ellendorff, Michelle Wastl, Gerold Schneider, University of Zurich
For questions, please contact info(a)swisstext.org<mailto:info@swisstext.org> or the organizing committee members.
Best regards,
Dr. Yingqiang Gao (he/him)
Department of Computational Linguistics
Andreasstrasse 15, Office AND 2-20
University of Zurich, CH-8050 Zurich
There is an open part-time (50%) Faculty position at the University of Hildesheim in Germany to fill for 3 years: Pre- or Postdoc (Translation Studies, Applied Linguistics, Computational Linguistics), knowledge of both German and English is obligatory. The announcement in German provides more details:
https://bewerbung.uni-hildesheim.de/jobposting/393fc6472ac442461bd082e36807…
--
Prof. Dr. Ekaterina Lapshinova-Koltunski
Mehrsprachige technische Fachkommunikation
Geschäftsführende Direktorin
Institut für Übersetzungswissenschaft und Fachkommunikation
Fachbereich 3: Sprach und Informationswissenschaften
Stiftung Universität Hildesheim
Lübecker Straße 3
31141 Hildesheim
+49 5121 883-30934
Second Call for Papers
*****************
NooJ 2026 International Conference
Naples, Italy
June 24-26, 2026
https://nooj2026.sciencescall.org/resource/page/id/2
*******************
Important dates:
*******************
Abstract submission: 31 January 2026
Notification of acceptance: 25 March 2026
Registration: until 13 April 2026
Conference dates: 24-26 June 2026
***********************************************
University of Naples "L'Orientale" and the NooJ association organize the 20th NooJ Conference in Naples, Italy from 24-26 June, 2026.
NooJ is a linguistic development environment that allows linguists to formalize several levels of linguistic phenomena: orthography and spelling; lexicons of simple words, multiword units and frozen expressions; inflectional, derivational and agglutinative morphology; local, constituent and dependency syntax; transformational grammars and semantic analysis. For each phenomenon, NooJ provides linguists with formal tools specifically adapted to facilitate the description, using the four types of Chomsky-Schützenberger formal grammars (regular, context-free, context-sensitive and unrestricted). This approach distinguishes NooJ from most computational linguistic frameworks which provide a single formalism.
NooJ is also a corpus processing tool, used in the digital humanities (in History, Literature, Psychology and Sociolinguistics) as it allows users to apply sophisticated linguistic resources to large corpora and build indices and concordances, annotate texts automatically, perform various statistical analyses, etc.
NooJ is freely available and linguistic modules can already be freely downloaded for over 30 languages, see https://nooj.univ-fcomte.fr
A Web demo is available for English, French, Spanish and Ukrainian at: https://webnooj.univ-fcomte.fr
******************************
The conference intends to:
******************************
* give NooJ users and researchers in Linguistics, Computational Linguistics and in the Digital Humanities the opportunity to meet and share their experience as developers, researchers and teachers;
* present to NooJ users the latest linguistic resources and NLP applications developed for/with NooJ, its latest functionalities, as well as its future developments;
* offer researchers and graduate students an advanced tutorial dedicated to the automatic transformational analysis/generation of texts.
*******************
Topics of interest:
*******************
* Lexical resources
* Computational morphology
* Syntactic analysis
* Semantic analysis
* Linguistic-based NLP applications
***************
Submission:
***************
We invite the submission of abstracts in English until 31 January 2026. The abstracts should contain the title, name and email of the author(s) and their institutions. Abstracts should not exceed one page (between 400 and 600 words) and should be sent to nooj2026(a)gmail.com. All proposals will be reviewed by the members of the scientific committee; authors will be given notice of acceptance of their papers no later than 25 March 2026.
Further information about the conference can be found at https://nooj2026.sciencescall.org/resource/page/id/5. You can also contact the organizing committee at nooj2026(a)gmail.com for any additional information.
************************
Scientific Committee:
************************
Marco Angster, University of Zadar, Croatia
Anabela Barreiro, INESC-ID, Portugal
Anita Bartulović, University of Zadar, Croatia
Magali Bigey, Université de Franche-Comté, France
Xavier Blanco, Autonomous University of Barcelona, Spain
Christian Boitet, Université Joseph Fourier, Grenoble, France
Maria Pia Di Buono, Università degli Studi di Napoli l'Orientale, Italy
Héla Fehri, University of Sfax, Tunisia
Zoe Gavriilidou, Democritus University of Thrace, Greece
Yuras Hetsevich, National Academy of Sciences, Belarus
Agata Jackievicz, Université Paul Valéry, France
Agnieszka Kaliska, Poznan University, Poland
Kristina Kocijan, University of Zagreb, Croatia
Walter Koza, National, University of General Sarmiento, Argentina
Svetlana Krylosova, INALCO, France
Mathieu Lafourcade Université de Montpellier, France
Laetitia Leonarduzzi, Université d’Aix-Marseille, France
Stefania Maci, Università di Bergamo, Italy
Samir Mbarki, IbnTofail University, Morocco
Linda Mijić, University of Zadar, Croatia
Johanna Monti, Università degli Studi di Napoli l'Orientale, Italy
Kamal Naït-Zerrad, INALCO, France
Thierry Poibeau, Laboratoire Lattice, CNRS, France
Andrea Rodrigo, University of Rosario, Argentina
Olena Saint-Joanis, INALCO, France
Max Silberztein, Université de Bourgogne Franche-Comté, France
Marko Tadić, University of Zagreb, Croatia
François Trouilleux, Université Clermont Auvergne, France
**************************
Organizing Committee:
**************************
* Johanna Monti, Università di Napoli L'Orientale, Italy
* Maria Pia di Buono, Università di Napoli L'Orientale, Italy
* Max Silberztein, Université de Franche-Comté, France
-----------------------------------------------------------------------------
Call for submissions
1st International Workshop on Quality in Large Language Models and
Knowledge Graphs
In conjunction with EDBT/ICDT 2026
QuaLLM-KG @ EDBT/ICDT 2026
24 March 2026, Tampere, Finland
Website: https://quallmkg2026.github.io/
*New deadline: January 25th AoE*
-----------------------------------------------------------------------------
**** Goal ****
QuaLLM-KG aims to bring together researchers and practitioners working
on quality issues at the intersection of large language models and
knowledge graphs. The workshop focuses on theories, methods, and
applications for assessing, improving, and monitoring the quality of
LLMs and KGs.
**** Important Dates ****
- Submission deadline: January 25th, 2026
- Notification: February 8th, 2026
- Camera-ready: February 20th, 2026
**** Topics ****
* Quality in Knowledge Graphs
- Accuracy, consistency, completeness, freshness
- Schema validation, constraint checking, error detection
- Entity resolution, link prediction, ontology alignment
- Provenance, explainability, trust in KG data
- KG quality in dynamic and large-scale settings
* Quality in Large Language Models
- Hallucination reduction & factual grounding
- Bias detection and mitigation
- Metrics & benchmarks for quality assessment
- Uncertainty estimation, calibration, interpretability
* Synergies Between KGs and LLMs
- KG-based grounding and fact-checking for LLMs
- LLM-based KG enrichment, extraction, entity linking
- Quality-driven prompting and fine-tuning
- Hybrid KG–LLM architectures for quality assurance
- Evaluation frameworks for integration and consistency
* Benchmarks and Evaluation Frameworks
- Datasets and metrics for KG & LLM quality
- Tools for monitoring, validation, maintenance
- Reproducibility, transparency, responsible AI
* Applications and Case Studies
- Scientific, industrial, enterprise use cases
- Quality at scale
- Human-in-the-loop quality control
**** Submissions ****
We invite submissions of full papers (up to 8 pages, excluding
references) and short papers describing work in progress, systems,
demos/systems/applications,
or vision/innovative ideas (up to 4 pages, excluding references).
Submissions should be in the CEUR-WS proceedings template.
Accepted papers will be published in the CEUR Workshop proceedings
(CEUR-WS.org).
**** Workshop Organizers ****
- Soror Sahri, Université Paris Cité, France
- Sven Groppe, University of Lübeck, Germany
- Farah Benamara, IPAL-CNRS, Singapore & University of Toulouse
--
========================
Farah Benamara Zitoune
Professor in Computer Science, Université de Toulouse
IRIT and IPAL-CNRS Singapore
118 Route de Narbonne, 31062, Toulouse.
Tel : +33 5 61 55 77 06
http://www.irit.fr/~Farah.Benamara
==================================
**Second Call for Papers**
Gaze4NLP - The Second Workshop on Gaze Data and Natural Language Processing
12 May 2026, Palma de Mallorca, Spain (co-located with LREC 2026)
https://gaze4nlp.github.io/Gaze4NLP2026/
The Second Workshop on Gaze Data and Natural Language Processing
(Gaze4NLP), co-located with LREC 2026 in Palma de Mallorca, Spain,
invites papers of a theoretical or experimental nature describing
research methodologies by employing interdisciplinary perspectives,
including computer science and engineering perspectives and cognitive
sciences, and identifying challenges to resolve in the intersection of
the two domains: eye tracking and NLP. Gaze4NLP aims to bring together
researchers conducting research on eyes on eyes on text and NLP; and
establishing bridges between them for identifying future venues of
research.
Workshop webpage:
https://gaze4nlp.github.io/Gaze4NLP2026/
Important Dates
Workshop paper submission deadline: 16 February 2026
Workshop paper acceptance notification: 16 March 2026
Workshop paper camera-ready versions: 30 March 2026
Workshop date: 12 May 2026
All deadlines are 11:59PM UTC-12:00 (anywhere on Earth)
Topics for the workshop will include, but are not limited to:
- Investigating the pillars for bridging the gap between the research
on eyes on text and NLP. Study how to expand research methodologies
by employing interdisciplinary perspectives, including computer
science and engineering perspectives and cognitive sciences, and
identify challenges, issues to resolve.
- Exploring new areas so that both fields benefit from each other
better than the past, identifying novel domains of exploration for
further research.
- Discussing how to develop cognitively inspired models that align
human reading data with LLMs.
Submissions
We solicit regular workshop papers, which will be included in the
proceedings as archival publications. The length of the papers should
be between 4 and 8 pages (excluding references). The submissions
should not include any appendices. Accepted papers will be presented
in the form of either oral or poster presentations.
Please note that camera-ready papers are allowed an additional page of
content to address reviewer comments, and unlimited pages for
appendices. The workshop proceedings will be part of the ACL
anthology. Accepted papers will also be given an opportunity with an
extended version to be published as part of an edited book.
Submissions will be handled via the START Conference Manager.
- Submission link: https://softconf.com/lrec2026/Gaze4NLP/
All submissions should follow the LREC style guidelines. We strongly
recommend the use of the LaTeX style files, OpenDocument, or Microsoft
Word templates created for LREC: <https://lrec2026.info/authors-kit/>.
All papers must be anonymous, i.e., not reveal author(s) on the title
page or through self-references. So, e.g., “We previously showed
(Smith, 2020)”, should be avoided. Instead, use citations such as
“Smith (2020) previously showed”.
LRE-Map and Sharing Language Resources
When submitting a paper from the START page, authors will be asked to
provide essential information about resources (in a broad sense, i.e.
also technologies, standards, evaluation kits, etc.) that have been
used for the work described in the paper or are a new result of your
research. Moreover, ELRA encourages all LREC authors to share the
described LRs (data, tools, services, etc.) to enable their reuse and
replicability of experiments (including evaluation ones).
Organization Committee:
Cengiz Acarturk, Jagiellonian University, Poland
Jamal Nasir, University of Galway, Ireland
Burcu Can, University of Stirling, Scotland, UK
Cagri Coltekin, University of Tubingen, Germany
School of Computer Science at University of Leeds has
two UK DLA PhD scholarships for this year (Oct 2026)
with a deadline of Friday 30th January:
https://phd.leeds.ac.uk/project/2357-epsrc-dla-scholarship-in-the-school-of…
Please encourage any eligible students to apply, as we have not had much demand so far so the success rate is likely to be quite good. These can be for any project in the school, starting October 2026, lasting 3.5 years.
This year both are for UK home-fee rated students only; applicants can check their eligibility here:
https://www.ukcisa.org.uk/student-advice/find-a-category-for-he-england
Eric Atwell, Professor of Artificial Intelligence for Language
School of Computer Science, Uni of LEEDS, LS2 9JT, UK
http://www.comp.leeds.ac.uk/eric
English version below
Bonjour,
Dans le cadre du projet DataLens, nous proposons un stage de M2 en
Machine Learning et Web sémantique visant à améliorer la complétion et
la structuration des métadonnées de jeux de données pour faciliter la
fédération de sources hétérogènes.
Plus d’informations et candidature :
https://recrutement.inria.fr/public/classic/fr/offres/2025-09456.
------------------------------------------------------------------------
Hello,
As part of the DataLens project, we are offering a Master’s internship
in Machine Learning and the Semantic Web focused on improving the
completion and structuring of dataset metadata to support the federation
of heterogeneous sources.
More information and application:
https://recrutement.inria.fr/public/classic/fr/offres/2025-09456.
Best regards,
--
Anaïs OLLAGNIER
Assistant Professor at Université Côte d'Azur | I3S | INRIA wimmics team
--------------------------------------------
Templiers 1, Bureau 417, 930 Route des Colles, BP 145
06903 Sophia Antipolis Cedex, France
anais.ollagnier(a)inria.fr |https://aollagnier.github.io/
CALL FOR PAPERS
The Second Workshop on Holocaust Testimonies as Language Resources
(HTRes-2026), a pre-conference workshop at LREC2026
Date: 11 May 2026
Location: Palma de Mallorca, Spain
Workshop web page: https://www.clarin.eu/HTRes2026
Submission Deadline: 20 February 2026
Submission link: https://softconf.com/lrec2026/HTRes2026/
Holocaust testimonies serve as a bridge between survivors and history’s
darkest chapters, providing a connection to the profound experiences of
the past. Testimonies stand as the primary source of information that
describes the Holocaust, offering first-hand accounts and personal
narratives of those who experienced it. The majority of testimonies are
captured in an oral format, as survivors vividly explain and share their
personal experiences and observations from that time period.
Transforming Holocaust testimonies into a machine-processable digital
format can be a difficult task owing to the unstructured nature of the
text. The creation of accessible, comprehensive, and well-annotated
Holocaust testimony collections is of paramount importance to our
society. These collections empower researchers and historians to
validate the accuracy of socially and historically significant
information, enabling them to share critical insights and trends derived
from these data.
The primary objective of this workshop is to explore how various
theories, techniques, and tools from corpus linguistics, natural
language processing, and digital humanities can contribute to the
examination, analysis, dissemination, and preservation of Holocaust
testimonies and other Holocaust-related documents.
The workshop is supported by CLARIN and EHRI.
Please find full details of the call for papers at the workshop web page
at https://www.clarin.eu/HTRes2026. The main conference website is at
https://lrec2026.info/ .
IMPORTANT DATES
Final date for paper submission: 12 February 2026
Notification of Acceptance: 11 March 2026
Camera-ready version submission: 30 March 2026
Workshop date: 11 May 2026
To contact the organisers, please email holocausttlr(a)gmail.com
From Martin Wynne on behalf of the organizing committee.
--
Senior Researcher in Corpus Linguistics
Faculty of Linguistics, Philology and Phonetics, University of Oxford
National Co-ordinator, CLARIN-UK
martin.wynne(a)ling-phil.ox.ac.uk
https://orcid.org/0000-0002-4155-0530
****************************************
Second Call for Papers:
The 6th workshop on: "Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from
people with various forms of cognitive/psychiatric/developmental impairments" in collaboration with the MENTAL.ai -consortium
Workshop: co-located with LREC 2026 | Palma de Mallorca, Spain | May 12th, 2026
RaPID-6(a)MENTAL.ai serves as an interdisciplinary platform for researchers to exchange insights, methods, and experiences related to collecting and processing data from individuals with mental, cognitive, neuropsychiatric, or neurodegenerative impairments. The workshop focuses on creating, processing, and applying such data resources from individuals at different stages and severity levels of these impairments. The ultimate goal of RaPID-6(a)MENTAL.ai is to facilitate the study of relationships among linguistic, paralinguistic, and extra-linguistic observations, with applications ranging from aiding diagnosis to enhancing monitoring and predicting individuals at higher risk, ultimately promoting multidisciplinary collaboration across clinical, language technology, computational linguistics, and computer science communities.
Submission deadline: Sun., 22nd of February, 2026 (anywhere on earth)
Paper submission: https://softconf.com/lrec2026/RaPID-6/user/
Invited speakers: Brian MacWhinney, Carnegie Mellon University, USA and Sunny X. Tang, MD, Feinstein Institutes for Medical Research, Northwell Health, USA
Website and details: https://spraakbanken.gu.se/rapid-2026
Contact: Dimitrios Kokkinakis
Contact email: dimitrios.kokkinakis(a)gu.se<mailto:dimitrios.kokkinakis@gu.se>
Organizing committee:
*
Dimitrios Kokkinakis, University of Gothenburg, Sweden
*
Charalambos Themistocleous, University of Oslo, Norway
*
Gaël Dias, University of Caen Normandie, France
*
Kathleen C. Fraser, University of Ottawa, Canada
*
Fredrik Öhman, University of Gothenburg and Sahlgrenska University Hospital, Sweden
*
Sebastião Pais, University of Beira Interior, Portugal
****************************************