Dear colleagues,
I'd like to share a new preprint on single-feature register
classification in English text:
"Schwa Density as a Phonological Stylistic Classifier: Primary
Stylistic, Secondary Modality -- A Four-Corpus Pre-Registered
Replication"
Preprint:
https://ling.auf.net/lingbuzz/009926/current.pdf?_s=WPGovroKhmABLC0P
Materials/code:
https://github.com/kylegtownsend-collab/schwa-density-spgc
Paper site: https://papers.letsharkness.com/schwa-density/
The paper tests whether schwa density -- the proportion of vowel
phones in a text that are unstressed schwa (CMUdict AH0) -- can
serve as a phonologically motivated single-feature register
classifier. A pre-registered confirmatory plan was applied to NLTK
multi-source (N=164) and the Standardized Project Gutenberg Corpus
(N=2,767), with sensitivity analyses on Brown (N=313) and OANC
(N=4,375).
Headline findings:
- Schwa density matches or exceeds Flesch-Kincaid on all
pre-registered corpora.
- A function-word ablation (masking the 198 NLTK English stopwords
before computing schwa density) preserves or amplifies register
discrimination on all four corpora (eta^2 retention 0.93-1.27),
ruling out stopword frequency as a confound.
- The ablation operationalises a two-regime finding: schwa density
functions as a Primary Stylistic Feature on within-prose
variation (NLTK, SPGC, Brown) and a Secondary Modality Feature on
speech-versus-writing variation (OANC).
- Joint partial-eta^2 retains 46-53% of the register signal on the
pre-registered corpora after controlling jointly for syllables
per word, mean word length, and Latinate ratio.
The pre-registration, deviation log, analyser, ablation and
G2P-fallback scripts, per-corpus feature tables, and
figure-generation code are all openly available in the repository
(MIT / CC-BY-4.0).
Comments and criticisms welcome.
Thanks,
Kyle Townsend
Independent
ktownsend(a)spfk12.org
*<Lexicom/>*
a workshop in digital lexicography and lexical computing
*Registration open*
*Bari, Italy*15 – 19 September 2025
Your 5 days to get up-to-date with the latest developments in
*corpus-driven lexicography* and to practice your
*corpus building and corpus query skills* with some of the top experts in
the field.
For the programme, lecturers, invited speakers, fees and registration,
visit this website
*lexicom.courses <https://lexicom.courses/upcoming-lexicom/>*
I hope to meet you in Bari in September!
Ondřej
*Ondřej Matuška*
sketchengine.eu <http://www.sketchengine.eu/> | Facebook
<https://www.facebook.com/SketchEngine/> | LinkedIn
<https://www.linkedin.com/in/ondrejmatuska> | Twitter
<https://twitter.com/SketchEngine>
International Conference 'New Trends in Translation and Interpreting
Technology' (NeTTIT'2026)
Dubrovnik, Croatia, 24-27 June 2026
https://nettt-conference.com
Extended Deadline Call for Papers
*** Extended submission deadline 27 April 2026 ***
# The conference
The third edition of the International Conference 'New Trends in
Translation and Interpreting Technology' (NeTTIT'2026) will take place
in Dubrovnik, Croatia from 24 to 27 June 2026.
The objective of the conference is (i) to bridge the gap between
academia and industry in the field of translation and interpreting by
bringing together academics in linguistics, translation and interpreting
studies, machine translation and natural language processing,
developers, practitioners, language service providers and vendors who
work on or are interested in different aspects of technology for
translation and interpreting, and (ii) to be a distinctive event for
discussing the latest developments and practices. NeTTIT'2026 invites
all professionals who would like to learn about the new trends, present
the latest work or/and share their experience in the field, and who
would like to establish business and research contacts, collaborations
and new ventures.
The conference will include plenary presentations (research and user
presentations, keynote speeches), poster sessions and panel discussions.
All submitted papers will be peer-reviewed by experts, and the accepted
papers will be published as open-access conference e- proceedings which
will be available at the time of the conference.
# Conference topics
Contributions are invited on any topic related to latest technology and
practices in translation, subtitling, localisation, interpreting,
machine translation and Large Language Models used in translation and
interpreting.
NeTTIT'2026 will feature a Special Theme Track "Future of Translation
and Interpreting Technologies in the Era of LLMs and Generative AI".
The conference topics include but are not limited to (see also the
special conference theme below):
## CAT tools
- Translation Memory (TM) systems
- NLP and MT for translation memory systems
- Terminology extraction tools
- Localisation tools
## Machine Translation
- Latest developments in Neural Machine Translation
- MT for under-resourced languages
- MT with low computing resources
- Multimodal MT
- Integration of MT in TM systems
- Resources for MT
## Technologies for MT deployment
- MT evaluation techniques, metrics and evaluation results
- Human evaluations of MT output
- Evaluating MT in a real-world setting
- Quality estimation for MT
- Domain adaptation
## Translation Studies
- Corpus-based studies applied to translation
- Corpora and resources for translation
- Translationese
- Cognitive effort and eye-tracking experiments in translation
## Interpreting studies
- Corpus-based studies applied to interpreting
- Corpora and resources for interpreting
- Interpretese
- Resources for interpreting and interpreting technology applications
- Cognitive effort and eye-tracking experiments in interpreting
## Interpreting technology
- Machine interpreting
- Computer-aided interpreting
- NLP for dialogue interpreting
- Development of NLP based applications for communication in public
service settings (healthcare, education, law, emergency services)
## Emerging Areas in Translation and Interpreting
- MT and translation tools for literary texts and creative texts
- MT for social media and real-time conversations
- Sign language recognition and translation
## Subtitling
- NLP and MT for subtitling
- Latest technology for subtitling
## User needs
- Analysis of translators' and interpreters' needs in terms of
translation and interpreting technology
- User requirements for interpreting and translation tools
- Incorporating human knowledge into translation and interpreting
technology
- What existing translators' (including subtitlers') and interpreters'
tools do not offer
- User requirements for electronic resources for translators and
interpreters
- Translation and interpreting workflows in larger organisations and the
tools for translation and interpreting employed
## The business of translation and interpreting
- Translation workflow and management
- Technology adoption by translators and industry
- Setting up translation /interpreting / language provider company
## Teaching translation and interpreting
- Teaching Machine Translation
- Teaching translation technology
- Teaching interpreting technology
- Latest AI developments in the syllabi of translation and interpreting
curricula
## Ethical issues in translation and technology
- Bias and fairness in MT
- Privacy and security in cloud MT systems
- Transparency and explainability of MT systems
- Environmental impact on MT systems
# Special Theme Track - Future of Translation and Interpreting
Technologies in the Era of LLMs and Generative AI
We are excited to share that NeTTIT'2026 will have a special theme with
the goal of stimulating discussion around Large Language Models,
Generative AI and the Future of Translation and Interpreting
Technologies. While the new generation of Large Language Models such as
CHATGPT, Gemini, Claude, DeepSeek and LLAMA showcase remarkable
advancements in language generation and understanding, we find ourselves
in uncharted territory when it comes to their performance on various
Translation and Interpreting Technology tasks with regards to fairness,
interpretability, ethics and transparency.
The theme track invites studies on how LLMs perform on Translation and
Interpreting Technology tasks and applications, and what this means for
the future of the field. The possible topics of discussion include (but
are not limited to) the following:
- Changes in (and the impact on) the translators and interpreters'
professions in the new AI era especially as a result of the latest
developments in LLMs and Generative AI
- Generative AI and translation
- Generative AI and interpreting
- Augmenting machine translation systems with generative AI
- Domain and terminology adaptation with Large Language Models
- Literary translation with Large Language Models
- Translation for low-resourced and minority languages with LLMs
- Improving Machine Translation Quality with Contextual Prompts in Large
Language Models
- Prompt engineering for translation
- Generative AI for professional translation
- Generative AI for professional interpreting
# Invited speakers
Yves Champollion, Wordfast LLC
Marko Grobelnik, Josef Stefan Institute
# Submissions and publication
NeTTIT'2026 invites the following types of submissions in English:
## Academic papers
- Regular long papers: These can be up to eight (8) pages long,
presenting substantial, original, completed, and unpublished work.
- Short papers: These can be up to four (4) pages long and are suitable
for describing small, focused contributions, work-in-progress, negative
results, system demonstrations, etc.
## User papers - for industry and practitioners. References to related
work are optional. Allowed paper length: between 2 and 4 pages.
Papers should be submitted through Softconf/START using the following
link: https://softconf.com/p/nettit2026/user/
For submitting the papers, we invite the authors to comply with the ACL
format using the templates available on the conference website. The
conference will not consider and evaluate abstracts only.
Further details on the submission procedure are available on the
conference website:
https://nettt-conference.com/2026/submissions-and-publication/
The accepted papers will be published in the conference e-proceedings
with assigned ISBN and DOI and made available online on the conference
website at the time of the conference. The conference organisers will
seek the inclusion of the conference proceedings in the ACL anthology.
# Important dates
- Extended submissions deadline: 27 April 2026
- Reviewing process: 28 April -18 May 2026
- Notification of acceptance: 20 May 2026
- Camera-ready due: 5 June 2026
- Conference camera-ready proceedings ready 19 June 2026
- Conference: 24-27 June 2026
Papers submitted before the submission deadline will be reviewed on a
rolling basis so that authors requiring visas can be notified earlier
and have sufficient time to obtain them
# Pre-conference Tutorials
The pre-conference tutorials will include:
Post-editing and AI-augmented translation -
Marie Escribe (LanguageWire and Polytechnic University of Valencia)
Machine Translation Quality Evaluation -
Tharindu Ranasinghe (Lancaster University)
Automatic Speech Recognition as a supporting tool for interpreters -
Constantin Orasan (University of Surrey)
# Conference Chairs
- Gloria Corpas Pastor (University of Malaga)
- Ruslan Mitkov (Lancaster University and University of Alicante)
- Marko Tadic (University of Zagreb)
# Programme Committee Chairs
- Constantin Orasan (University of Surrey)
- Tharindu Ranasinghe (Lancaster University)
# Publication Chairs
- Marie Escribe (LanguageWire and Polytechnic University of Valencia)
- Alicia Picazo Izquierdo (University of Alicante)
# Organising Committee and Programme Committee coordination
-- Marie Escribe (LanguageWire and Polytechnic University of Valencia)
- Alicia Picazo Izquierdo (University of Alicante)
- Xiaojing Zhao (Hong Kong Polytechnic University)
# Publicity and Sponsorship Chair
- Vilelmini Sosoni (Ionian University)
# Programme committee
For a list of the programme committee members visit:
https://nettt-conference.com/2026/programme-committee/
# Venue
The conference will take place at the Centre for Advanced Academic
Studies (CAAS) of the University of Zagreb (http://www.caas.unizg.hr/)
in Dubrovnik.
# Sponsor
Juremy.com
# Sponsorship opportunities
Companies working in the fields of translation technology, interpreting
technology and/or related fields, are welcome to familiarise themselves
with the sponsorship opportunities that the conference offers. Please
visit https://nettt-conference.com/2026/sponsors/ for more details.
# Further information and contact details
The conference website https://nettt-conference.com/ is updated on a
regular basis. For further information, please email
nettit2026(a)nettt-conference.com.
You can also follow us on social media for updates and announcements.
LinkedIn - https://www.linkedin.com/company/nettit2026/
Twitter/X - https://x.com/NeTTIT2026
--
Amal Haddad Haddad (She/her)
Facultad de Traducción e Interpretación
Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad
Lexicon Research Group |http://lexicon.ugr.es/haddad
Co-Convenor, BAAL SIG 'Humans, Machines,
Language'|https://r.jyu.fi/humala
Event Coordinator, BAAL SIG 'Language, Learning and Teaching'
===============
Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a
su destinatario y puede contener información privilegiada o
confidencial. Si no es Ud. el destinatario indicado, queda notificado de
que la utilización, divulgación o copia sin autorización está prohibida
en virtud de la legislación vigente. Si ha recibido este mensaje por
error, se ruega lo comunique inmediatamente por esta misma vía y proceda
a su destrucción.
This message is intended exclusively for its addressee and may contain
information that is CONFIDENTIAL and protected by professional
privilege. If you are not the intended recipient you are hereby notified
that any dissemination, copy or disclosure of this communication is
strictly prohibited by law. If this message has been received in error,
please immediately notify us via e-mail and delete it"
===============
In this newsletter:
New publications:
DEFT Chinese and English Light and Rich ERE Parallel Annotation<https://catalog.ldc.upenn.edu/LDC2026T04>
MATERIAL Tagalog-English Language Pack<https://catalog.ldc.upenn.edu/LDC2026S05>
LORELEI Somali Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2026T03>
________________________________
New publications:
DEFT Chinese and English Light and Rich ERE Parallel Annotation<https://catalog.ldc.upenn.edu/LDC2026T04> was developed by LDC and consists of 179 Chinese discussion forum documents and their English translations annotated for entities, relations, and events (ERE). Light ERE annotation labels entity mentions for the target set of entity, relation, and event types between and among those entities including coreference. Rich ERE annotation expands types and tagging in the entities, relations, and events annotation tasks and replaces strict event coreference with a more loosely defined event hopper annotation. 179 Chinese-English document pairs were annotated following Light ERE annotation guidelines; a subset of 171 Chinese-English document pairs were also labeled with Rich ERE annotation. The source data and English translations were drawn from BOLT Chinese Discussion Forum Parallel Training Data (LDC2017T05)<https://catalog.ldc.upenn.edu/LDC2017T05>, originally collected and translated by LDC under the DARPA BOLT program.
DARPA's Deep Exploration and Filtering of Text (DEFT) program aimed to address remaining capability gaps in state-of-the-art natural language processing technologies related to inference, causal relationships, and anomaly detection. LDC supported the DEFT program by collecting, creating, and annotating a variety of data sources.
2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
MATERIAL Tagalog-English Language Pack<https://catalog.ldc.upenn.edu/LDC2026S05> was developed by Appen<http://www.appen.com/> for the IARPA MATERIAL<https://www.iarpa.gov/index.php/research-programs/material> program and contains 100 hours of Tagalog conversational telephone speech, transcripts, English translations, annotations, and queries. Calls were made using different telephones (e.g., mobile, landline) from a variety of environments. Transcripts cover approximately 30% of the speech files, 2% of which were translated into English. This release also includes domain annotations, English queries, and their relevance annotations.
The MATERIAL program focused on underserved languages with the ultimate goal to build cross language information retrieval systems to find speech and text content using English search queries.
2026 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
*
LORELEI Somali Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2026T03> contains over 13 million words of Somali monolingual text, 800,00 words of which were translated into English, and 106,000 Somali words translated from English data. Approximately 73,000 words were annotated for simple named entities, around 23,000 words were annotated for full entity (including nominals and pronouns), and over 10,000 words were covered by noun phrase chunking annotation. Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
Dear all,
We are pleased to announce the second round of the Model Compression Shared
Task <https://www2.statmt.org/wmt26/model-compression.html> at WMT 2026
<https://www2.statmt.org/wmt26/>.
This shared task aims to evaluate the potential of model compression
techniques in reducing the size of general-purpose large language models,
with the goal of achieving an optimal balance between practical
deployability and high translation quality in specific machine translation
(MT) scenarios. The task’s broader objectives include fostering research
into the efficient, accessible, and sustainable deployment of LLMs for MT,
establishing a common evaluation framework to monitor progress in model
compression across a wide range of languages, and enabling meaningful
comparisons with state-of-the-art MT systems through standardized
evaluation protocols designed to assess not only translation quality but
also computational efficiency.
Although the focus is on model compression, the task is closely aligned
with the General MT shared task
<https://www2.statmt.org/wmt26/translation-task.html>, sharing test data
from a subset of its language directions, as well as protocols for
automatic MT quality evaluation. Additionally, the task follows the same
timeline as the flagship WMT task.
We warmly invite participation from academic teams and industry players
interested in applying existing compression methods to MT or exploring
innovative, cutting-edge approaches.
THE TASK IN A NUTSHELL
Goal: Reduce the size of a general-purpose LLM while maintaining a balance
between model compactness and MT performance.
Languages: The second round of the task will focus on a subset of the
languages covered by the General MT task, namely: Czech to German, English
to Chinese (Simplified), and English to Arabic (Egyptian).
Conditions:
-
Constrained: Participants will compress a specific model, using a
predefined pool of data for calibration and fine-tuning (if needed) to
ensure directly comparable results.
-
Unconstrained: Participants are free to compress any model, provided its
original size is below 20B parameters, and use any additional data for
calibration and fine-tuning.
Participation format: Participants will share their compressed models to be
run on a standardized hardware environment provided by the organizers.
Evaluation Criteria:
-
Translation quality: Automatically assessed using multiple metrics, e.g.
Comet, MetricX, and an LLM-as-a-judge framework.
-
Model size: Defined by memory usage.
-
Inference speed: Measured by total processing time over the test set.
IMPORTANT DATES
-
Test data released: June 18, 2026
-
Model Submission deadline: July 2, 2026
-
System description paper submission: in line with WMT26
<https://www2.statmt.org/wmt26/index.html>
-
Camera-ready submission: in line with WMT26
<https://www2.statmt.org/wmt26/index.html>
-
WMT 2026 Conference (co-located with EMNLP2026 <https://2026.emnlp.org/>
in Budapest, Hungary): November, 2026
WEBSITE: https://www2.statmt.org/wmt26/model-compression.html
ORGANIZERS:
Marco Gaido, Fondazione Bruno Kessler
Matteo Negri, Fondazione Bruno Kessler
Roman Grundkiewicz - Microsoft Translator
TG Gowda - Microsoft Translator
CONTACTS:
Marco Gaido - mgaido(a)fbk.eu
Matteo Negri - negri(a)fbk.eu
--
--
Le informazioni contenute nella presente comunicazione sono di natura
privata e come tali sono da considerarsi riservate ed indirizzate
esclusivamente ai destinatari indicati e per le finalità strettamente
legate al relativo contenuto. Se avete ricevuto questo messaggio per
errore, vi preghiamo di eliminarlo e di inviare una comunicazione
all’indirizzo e-mail del mittente.
--
The information transmitted is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material. If you received this in
error, please contact the sender and delete the material.
Final Call For Participation HIPE 2026 – CLEF Shared Task on Person-Place Relation Extraction from Multilingual Historical Texts
(apologies for cross-postings)
________________________________
HIPE: Identifying Historical People, Places and other Entities.
Website: https://hipe-eval.github.io/HIPE-2026/
Tasks: Person-Location Relation Extraction from Multilingual Historical Texts.
Registration: https://clef-labs-registration.dipintra.it/ (until 23 April 2026)
Training data releases: 19 Dec 2025 (partial); 19 Jan 2026 (full)
Evaluation period: 5–7 May 2026
Workshop venue: during CLEF conference, 21–24 September 2026, Jena, Germany.
LinkedIn: @ImpressoProject / #HIPE2026 / @clef_initiative / #clef2026
________________________________
"Who was where when?"
We invite participation in the third edition of the HIPE shared task, dedicated to the extraction of person–place relations in multilingual historical documents. Building on the success of HIPE-2020 and HIPE-2022, which focused on entity recognition and linking, HIPE-2026 aims to enable finer-grained analysis of entities and support the accurate reconstruction of individuals’ geographical and temporal trajectories.
The objective of HIPE-2026 is to build systems capable of determining whether a relation holds between a person and a location (place) mentioned in a document, and classify its temporal scope. Participants are asked to develop systems that determine, for each (person, location) pair associated with a historical document, whether the text implies that the person is at that location within the document’s temporal horizon (isAt relation), or that the person was there at some earlier moment in their life (a more general At relation), or that no such link can be established.
Can large language models take up the challenges? Simple co-occurrences of entity mentions in a text are not sufficient to uncover the implicit and explicit, temporally anchored relations between person and locations. Addressing this challenge requires temporal reasoning, geographical inference, and the interpretation of noisy historical texts (often with only fragmentary contextual cues) to classify person–location relations with varying degrees of certainty.
The task is designed to be tackled by generative AI systems/LLMs as well as by more traditional classification approaches.
HIPE-2026 features two evaluation profiles
* Accuracy Profile: Focusing on system performance in relation classification.
* Efficiency Profile: Rewarding scalable, lightweight approaches considering model size and compute cost.
* Generalization Profile: An unseen dataset from a different domain will be included to evaluate systems’ ability to generalise beyond the newspaper domain data.
For the accuracy and efficiency profile, training and test data originate from historical newspapers in English, German, French and Luxembourgish.
Entity pairs will be provided.
For further information on data, tasks, and evaluation settings
* HIPE-2026 website: https://hipe-eval.github.io/HIPE-2026/
* Participation Guidelines: https://doi.org/10.5281/zenodo.17800136
* HIPE-2026-data GitHub repository: https://github.com/hipe-eval/HIPE-2026-data
On HIPE shared tasks
HIPE evaluation lab series is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop technologies to efficiently retrieving and exploring information from historical texts.
Important upcoming dates
* 23 Apr 2026: Lab registration closes.
* 05 May 2026: Test data release (10:00 CEST).
* 07 May 2026: Participant run submission deadline.
* 13 May 2026: Publication of results and release of test data.
* 28 May 2026: Submission of participant notebook paper.
* 10 Jul 2026 / 31 Aug 2026: CLEF conference regular/late registration DL.
* 21 Sep 2026: CLEF 2026 Conference.
Best regards,
HIPE-2026 Shared Task Organizers
https://hipe-eval.github.io/HIPE-2026/
[Apologies for cross-postings]
*********************************************************
TSD 2026 - LAST CALL FOR PAPERS
*********************************************************
Twenty-ninth International Conference on TEXT, SPEECH and DIALOGUE (TSD 2026)
Brno, Czech Republic, 1-4 September 2026
http://www.tsdconference.org/
THE SUBMISSION DEADLINE has been EXTENDED to:
April 25 2026 ............ Submission of full papers
You may get updates by following new https://www.linkedin.com/company/tsdconference/
The conference is organized by the Faculty of Informatics, Masaryk
University, Brno, and the Faculty of Applied Sciences, University of
West Bohemia, Pilsen. The conference is supported by International
Speech Communication Association.
Venue: Brno, Czech Republic
SUBMISSION OF PAPERS
Authors are invited to submit a full paper not exceeding 12 pages
formatted in the LNCS style (including references). Those accepted
will be presented either orally or as posters. The decision about the
presentation format will be based on the recommendation of the
reviewers. Proceedings papers do not differentiate the presentation
format. The authors are asked to submit their papers using the
on-line form accessible from the conference website.
Papers submitted to TSD 2026 must not be under review by any other
conference or publication during the TSD review cycle, and must not be
previously published or accepted for publication elsewhere.
Publishing on preprint servers is not forbidden, but authors are
warned that when doing so this might influence the blind reviewing
conditions.
As reviewing will be blind, the paper should not include the authors'
names and affiliations. Furthermore, self-references that reveal the
author's identity, e.g., "We previously showed (Smith, 1991) ...",
should be avoided. Instead, use citations such as "Smith previously
showed (Smith, 1991) ...". Papers that do not conform to the
requirements above are subject to be rejected without review.
The authors are strongly encouraged to write their papers in TeX or
LaTeX formats. These formats are necessary for the final versions of
the papers that will be published in the Springer Lecture Notes.
The paper format for review has to be in the PDF format with all
required fonts included. Upon notification of acceptance, presenters
will receive further information on submitting their camera-ready and
electronic sources (for detailed instructions on the final paper
format see https://www.tsdconference.org/tsd2026/paper_instr.html).
Authors are also invited to present actual projects, developed
software or interesting material relevant to the topics of the
conference. The presenters of demonstrations should provide an
abstract not exceeding one page. The demonstration abstracts will not
appear in the conference proceedings.
KEYNOTE SPEAKERS
Anders Soegaard, University of Copenhagen, Denmark
TSD SERIES
TSD series evolved as a prime forum for interaction between researchers in
both spoken and written language processing from all over the world.
Proceedings of TSD form a book published by Springer-Verlag in their
Lecture Notes in Artificial Intelligence (LNAI) series. TSD Proceedings
are regularly indexed by Thomson Reuters Conference Proceedings Citation
Index/Web of Science. Moreover, LNAI series are listed in all major
citation databases such as DBLP, SCOPUS, EI, INSPEC or COMPENDEX.
CALL for SATELLITE WORKSHOP PROPOSALS
https://www.tsdconference.org/tsd2026/conf_workshop_proposals.html
The TSD 2026 conference will be accompanied by one-day satellite workshops
or project meetings with organizational support by the TSD organizing
committee. The organizing committee can arrange for a meeting room at the
conference venue and prepare a workshop proceedings as a book with ISBN by
a local publisher. The workshop papers that will pass also the standard TSD
review process will appear in the Springer proceedings. Each workshop is
a subject to proposal that should be sent via the proposal submission form
or discussed via the contact e-mail tsd2026(a)tsdconference.org ahead of the
respective deadline.
TOPICS
Topics of the conference will include (but are not limited to):
Corpora and Language Resources (monolingual, multilingual,
text and spoken corpora, large web corpora, large language models,
disambiguation, specialized lexicons, dictionaries)
Speech Recognition (multilingual, continuous, emotional
speech, handicapped speaker, out-of-vocabulary words,
alternative way of feature extraction, new models for
acoustic and language modelling)
Tagging, Classification and Parsing of Text and Speech
(morphological and syntactic analysis, synthesis and
disambiguation, multilingual processing, sentiment analysis,
credibility analysis, automatic text labeling, summarization,
authorship attribution)
Speech and Spoken Language Generation (multilingual, high
fidelity speech synthesis, computer singing)
Semantic Processing of Text and Speech (information
extraction, information retrieval, data mining, semantic web,
knowledge representation, inference, ontologies, sense
disambiguation, plagiarism detection, fake news detection)
Integrating Applications of Text and Speech Processing
(machine translation, natural language understanding,
question-answering strategies, assistive technologies)
Automatic Dialogue Systems (self-learning, multilingual,
question-answering systems, dialogue strategies, prosody in
dialogues)
Multimodal Techniques and Modelling (video processing, facial
animation, visual speech synthesis, user modelling, emotions
and personality modelling)
Papers on processing of languages other than English are strongly
encouraged.
PROGRAM COMMITTEE
Elmar Noeth, Germany (general chair)
Rodrigo Agerri, Spain
Tomas Arias-Vergara, Germany
Vladimir Benko, Slovakia
Archna Bhatia, USA
Jan Cernocky, Czech Republic
Simon Dobrisek, Slovenia
Kamil Ekstein, Czech Republic
Karina Evgrafova, Russia
Yevhen Fedorov, Ukraine
Volker Fischer, Germany
Darja Fiser, Slovenia
Lucie Flek, Germany
Bjorn Gamback, Norway
Radovan Garabik, Slovakia
Alexander Gelbukh, Mexico
Louise Guthrie, USA
Jan Hajic, Czech Republic
Eva Hajicova, Czech Republic
Yannis Haralambous, France
Hynek Hermansky, USA
Daniel Hládek, Slovakia
Ales Horak, Czech Republic
Eduard Hovy, USA
Milos Jakubicek, Czech Republic
Maria Khokhlova, Russia
Aidar Khusainov, Russia
Daniil Kocharov, Russia
Miloslav Konopik, Czech Republic
Valia Kordoni, Germany
Evgeny Kotelnikov, Russia
Pavel Kral, Czech Republic
Siegfried Kunzmann, USA
Oier Lopez de Lacalle, Spain
Nikola Ljubesic, Croatia
Natalija Loukachevitch, Russia
Bernardo Magnini, Italy
David Mareček, Czech Republic
Jindrich Matousek, Czech Republic
Vaclav Matousek, Czech Republic
Roman Moucek, Czech Republic
Daša Munková, Slovakia
Agnieszka Mykowiecka, Poland
Hermann Ney, Germany
Joakim Nivre, Sweden
Juan Rafael Orozco-Arroyave, Colombia
Paula Andrea Perez-toro, Germany
Maciej Piasecki, Poland
Josef Psutka, Czech Republic
James Pustejovsky, USA
German Rigau, Spain
Paolo Rosso, Spain
Leon Rothkrantz, The Netherlands
Anna Rumshisky, USA
Milan Rusko, Slovakia
Pavel Rychly, Czechia
Mykola Sazhok, Ukraine
Pavel Skrelin, Russia
Petr Sojka, Czech Republic
Ján Staš, Slovakia
Georg Stemmer, Germany
Marko Robnik Sikonja, Slovenia
Marko Tadic, Croatia
Jan Trmal, Czechia
Tamas Varadi, Hungary
Zygmunt Vetulani, Poland
Aleksander Wawer, Poland
Alina Wroblewska, Poland
Jerneja Zganec Gros, Slovenia
FORMAT OF THE CONFERENCE
The conference program will include presentation of invited papers,
oral presentations, and poster/demonstration sessions. Papers will be
presented in plenary or topic oriented sessions.
Social events including a trip in the vicinity of Brno will allow
for additional informal interactions.
IMPORTANT DATES
April 25 2026 ............ Submission of full papers
June 5 2026 .............. Notification of acceptance
June 15 2026 ............. Final papers (camera ready) and registration
August 8 2026 ............ Submission of demonstration abstracts
August 15 2026 ........... Notification of acceptance for
demonstrations sent to the authors
September 1-4 2026 ....... Conference date
Submission of abstracts serves for better organization of the review
process only - please submit your abstract as soon as possible.
For the actual review a full paper submission is necessary.
The accepted conference contributions will be published in Springer
proceedings that will be made available to participants at the time
of the conference.
OFFICIAL LANGUAGE
The official language of the conference is English.
ACCOMMODATION
The organizing committee will arrange discounts on accommodation in
the 4-star hotel at the conference venue. The current prices of the
accommodation will be available at the conference website.
ADDRESS
All correspondence regarding the conference should be
addressed to
Ales Horak, TSD 2026
Faculty of Informatics, Masaryk University
Botanicka 68a, 602 00 Brno, Czech Republic
phone: +420-5-49 49 18 63
fax: +420-5-49 49 18 20
email: tsd2026(a)tsdconference.org
The official TSD 2026 homepage is: http://www.tsdconference.org/tsd2026
LinkedIn: https://www.linkedin.com/company/tsdconference/
LOCATION
Brno is the second largest city in the Czech Republic with a
population of almost 400,000 and is the country's judiciary and
trade-fair center. Brno is the capital of South Moravia, which is
located in the south-eastern part of the Czech Republic and is known
for its wide range of cultural, natural, and technical attractions.
South Moravia is a traditional wine region. Brno had been a Royal
City since 1347 and with its six universities it forms a cultural
center of the region.
Brno can be reached easily by direct flights from London, Milano, and
Malaga, and by trains or buses from Vienna (150 km) or Prague (230 km).
For the participants with some extra time, nearby places may
also be of interest. Local ones include: Brno Castle now called
Spilberk, Veveri Castle, the Old and New City Halls, the
Augustine Monastery with St. Thomas Church and crypt of Moravian
Margraves, Church of St. James, Cathedral of St. Peter & Paul,
Cartesian Monastery in Kralovo Pole, the famous Villa Tugendhat
(UNESCO) designed by Mies van der Rohe along with other important
buildings of between-war Czech architecture.
For those willing to venture out of Brno, Moravian Karst with
Macocha Chasm and Punkva caves, battlefield of the Battle of
three emperors (Napoleon, Russian Alexander and Austrian Franz
- Battle by Austerlitz), Chateau of Slavkov (Austerlitz),
Pernstejn Castle, Buchlov Castle, Lednice Chateau, Buchlovice
Chateau, Letovice Chateau, Mikulov with one of the largest Jewish
cemeteries in Central Europe, Telc - a town on the UNESCO
heritage list, and many others are all within easy reach.
The project: <https://aclanthology.org/2025.findings-acl.785/> Position
Paper: MeMo: Towards Language Models with Associative Memory Mechanisms -
ACL Anthology
Do YOU have a strong mathematical background, good programming skills,
passion and curiosity, the ability to think out of the box? This is the
Ph.D. Program for YOU:
* Position 1) Mechanistic Interpretability by-design in
Neural-Network-based Large Language Models: This theme is in the context of
the MeMo project that aims to build alternative ways to build transparent
Large Language Models by explicitly using Associative Memories.
* Position 2) Injecting Neuro-symbolic Natural Language Processing
Capabilities in Neural-Network-based Large Language Models: This theme is in
the context of the MeMo project that aims to build alternative ways to build
transparent Large Language Models by explicitly using Associative Memories.
The positions will formally open in mid-April 2026 and will start on
November 1st, 2026.
If you want to see if you fit the position, take this test
<https://forms.office.com/e/sSCGhP9MgF> Open Fully-funded Ph.D. Positions in
the Data Science Ph.D. School and book an informal chat to understand if
the positions fit you.
Positions are based in Ph.D. on Data Science at University of Rome Tor
Vergata (Rome, Italy) in the <https://humancentricart.github.io/>
HumanCentricART Group
Fourth International Workshop on Gender-Inclusive Translation Technologies (GITT) at EAMT 2026
15 June 2026, Tilburg, The Netherlands
https://sites.google.com/view/gitt2026/
@gitt-workshop.bsky.social
Important Dates (Time zone: Anywhere on Earth)
(NEW) Submission deadline: 26 April, 2026
Notification of Acceptance: 13 May, 2026
Camera Ready Copy due: 20 May, 2026
Workshop: 15 June, 2026
**Aim and scope**
The Gender-Inclusive Translation Technologies Workshop (GITT) is set out to be the dedicated workshop that focuses on gender-inclusive language in translation and cross-lingual scenarios. The workshop aims to bring together researchers from diverse areas, including industry partners, MT practitioners, and language professionals. GITT aims to encourage multidisciplinary research that develops and interrogates both solutions and challenges for addressing bias and promoting gender inclusivity in MT and translation tools, including LMs applications for the translation task.
**Topics**
GITT invites technical as well as non-technical submissions, which consist of experimental, theoretical or methodological contributions. We explicitly welcome interdisciplinary submissions and submissions that focus on innovative, non-binary linguistic strategies and/or with sociolinguistically-informed perspectives. The topics of interest include, but are not limited to:
- Models or methods for assessing and mitigating gender bias
- New resources for inclusive language and gender translation (e.g., datasets, translation memories, dictionaries)
- Social, cross-lingual, and ethical implications of gender bias
- Qualitative and quantitative analyses on the potential limits of current approaches to gender bias in translation and MT, error taxonomies as well as best practices and guidelines
- User-centric case studies on the impact of biased language and/or mitigating approaches which can include translators, post-editors, or monolingual MT users
GITT is also open to other non-listed topics aligned with the scope of the workshop and works focusing on non-textual modalities (e.g., audiovisual translation)
**Submission**
We welcome four types of submissions, two archival and two non-archival.
ARCHIVAL
- Research papers: of at least 4 up to 10 pages (excluding references)
- Extended Abstracts: up to 2 pages (including references)
Accepted papers and extended abstracts consisting of novel work will be published online as proceedings in the ACL Anthology.
NON-ARCHIVAL
- Research Communications: up to 2 pages (including references).
We include a parallel submission policy in the form of Research Communications for papers related to the topic of GITT that were accepted in other venues in 2025 and 2026.
- Potluck Communications: short abstract up to 500 words (including references).
Potluck Communications offer a space for anyone—especially students and early career researchers—to discuss bold new ideas for collaboration, brainstorm about ongoing work, and explore future research directions.
The communications will not be included in the proceedings, but will serve to promote the dissemination of research aligned with the scope of the workshop.
All submissions should adhere to the EAMT 2026 guidelines and style templates (PDF, LaTeX, Word) and be uploaded on Easychair ( https://easychair.org/conferences?conf=eamt2026)
**Workshop organizers**
Manuel Lardelli, University of Padova
Janiça Hackenbuchner, University of Ghent
Luisa Bentivogli, Fondazione Bruno Kessler
Joke Daems, University of Ghent
Beatrice Savoldi, Fondazione Bruno Kessler
Eleni Gkovedarou, University of Ghent
Call for Participation for the 7th RAIL WORKSHOP
RAIL workshop
https://sadilar.org/en/seventh-workshop-on-resources-for-african-indigenous…
Co-located with LREC 2026 https://www.elra.info/lrec2026
RAIL workshop date: 12 May 2026
Conference venue: Palau de Congressos de Palma, Palma de Mallorca
(Spain)
Theme: Creating resources for less-resourced African languages
The seventh Resources for African Indigenous Languages (RAIL) workshop
will be co-located with the LREC 2026 conference in Palma, Mallorca
(Spain), on 12 May 2026. The RAIL workshop is an interdisciplinary
platform for researchers working on African indigenous language
resources such as natural language processing (NLP) tools, Human
Language Technologies (HLT), data collections, and annotations. This
workshop aims to foster a scientific community of practice that focuses
on computational linguistic tools and data that are designed for or
applied to the indigenous languages of Africa.
Many African languages are under-resourced, while only a few are
considered to be somewhat better resourced. These languages often share
interesting properties such as writing systems, making them different
from most high-resourced languages. From a computational perspective,
these languages lack enough corpora to undertake high-level development
of NLP and HLT tools, which in turn impedes the development of African
languages in these areas. During previous workshops, it was noted that
the problems and solutions presented were not only applicable to
African languages but were also relevant to many other low-resource
languages across the world. Because these languages share similar
challenges, this workshop provides researchers with opportunities to
work collaboratively on issues of language resource development and
learn from each other.
The RAIL workshop has several aims. First, the workshop brings together
researchers who work on African indigenous languages, forming a
community of practice for people working on indigenous languages.
Second, the workshop aims to reveal currently unknown or unpublished
existing resources (corpora, NLP tools, and applications), resulting in
a better overview of the current state-of-the-art, and also allows for
discussions on novel, desired resources for future research in this
area. Third, it enhances the sharing of knowledge regarding the
development of low-resource languages. Finally, it enables discussions
on how to improve the quality as well as the availability of the
resources.
Organising Committee
Muzi Matfunjwa, South African Centre for Digital Language Resources
Mmasibidi Setaka, South African Centre for Digital Language Resources
Rooweither Mabuya, South African Centre for Digital Language Resources
Menno van Zaanen, South African Centre for Digital Language Resources
--
Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za
Professor in Digital Humanities
South African Centre for Digital Language Resources
https://www.sadilar.org
________________________________
NWU PRIVACY STATEMENT:
http://www.nwu.ac.za/it/gov-man/disclaimer.html
DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system.
________________________________