June 2025 - Corpora - ELRA lists

Digital lexicography and lexical computing workshop, Bari, Italy
by Ondřej Matuška 16 Apr '26

16 Apr '26

*<Lexicom/>* a workshop in digital lexicography and lexical computing *Registration open* *Bari, Italy*15 – 19 September 2025 Your 5 days to get up-to-date with the latest developments in *corpus-driven lexicography* and to practice your *corpus building and corpus query skills* with some of the top experts in the field. For the programme, lecturers, invited speakers, fees and registration, visit this website *lexicom.courses <https://lexicom.courses/upcoming-lexicom/>* I hope to meet you in Bari in September! Ondřej *Ondřej Matuška* sketchengine.eu <http://www.sketchengine.eu/> | Facebook <https://www.facebook.com/SketchEngine/> | LinkedIn <https://www.linkedin.com/in/ondrejmatuska> | Twitter <https://twitter.com/SketchEngine>

1 2

CfP DHOW: Workshop Diffusion of Harmful Content on Online Web Workshop - @ ACMMM 2025
by Thomas Mandl 15 Mar '26

15 Mar '26

The 2st Workshop on DHOW: Diffusion of Harmful Content on Online Web Workshop The workshop will be conducted in a *hybrid* format to ensure maximum participation, accommodating attendees both *online* and in person. Submission deadline: *July 11 2025 AOE* *Workshop site*: https://dhow-workshop.github.io/2025/ *Co-located with ACMMM 2025* https://acmmm2025.org/ <https://lrec-coling-2024.org/> Dublin, Ireland, 27-31 October 2024 *Important Dates* Submission deadline: extended to *July 11, 2025* Notification of acceptance: August 01, 2025 Camera-ready papers due: August 11, 2025 Workshop date: October 27/28, 2025 *Workshop Description* With the advancement of digital technologies and gadgets, online content is easily accessible. At the same time, harmful content also gets spread. There are different harmful content available on different platforms in multiple languages. The topic of harmful content is broad and covers multiple research directions. But from the user’s aspect, they are affected by them all. Often, it is studied individually, like misinformation and hate speech. Research has been done on one platform, monolingual, on a particular issue. It leads to harmful content spreaders switching platforms and languages to reach the user base. Harmful is not limited to social media but also news media. Spreader shares harmful content in posts, news articles, comments, and hyperlinks. So, there is a need to study the harmful content by combining cross-platform, language, multimodal data and topics. We will bring the research on harmful content under one umbrella so that research on different topics (hate speech, misinformation, disinformation, self-harm, offensive content, etc.) can bring some novel methods and recommendations for users, leveraging text analysis with image, audio, and video recognition to detect harmful content in diverse formats. The workshop will cover the ongoing issue of war or elections in 2025. We believe this workshop will provide a unique opportunity for researchers and practitioners to exchange ideas, share latest developments, and collaborate on addressing the challenges associated with harmful contents spread across the Web. We expect that the workshop will generate insights and discussions that will help advance the field of societal artificial intelligence (AI) for the development of safer internet. In addition to attracting high quality research contributions to the workshop, one of the aims of the workshop is to mobilise the researchers working on the related areas to form a community. *Submissions Topics* •Studying different types of harmful content •Computational fact-checking & Misinformation Detection •Role of Generative AI in Mitigating Harmful Content •Harassment, Bullying, and Hate Speech Detection •Explainable AI for Harmful Content Analysis •Multimodal and Multilingual Harmful Content Detection such as fake news, spam, and troll detection. •Deepfake and Synthetic Media •Ethical & Societal Implications of AI in Content Moderation •Both Qualitative and Quantitative study on harmful content •Psychological effects of harmful content like mental health •Approaches for data collection or data annotation using multimodal large models on harmful content •User study on the effects of harmful content on human beings *Submissions* - Submission Instructions: https://dhow-workshop.github.io/2025/#call <https://dhow-workshop.github.io/2025/#call> - Submission Link: https://openreview.net/group?id=acmmm.org/ACMMM/2025/Workshop/DHOW <https://openreview.net/group?id=acmmm.org/ACMMM/2025/Workshop/DHOW> ***Workshop organizers* •Thomas Mandl (University of Hildesheim, Germany) •Haiming Liu (University of Southampton, United Kingdom) •Gautam Kishore Shahi(University of Duisburg-Essen, Germany) •Amit Kumar Jaiswal (University of Surrey, United Kingdom ) •Durgesh Nandini (University of Bayreuth, Germany) DHOW 2025

1 3

Third Call for Papers, Participation and Demo - IWSLT 2025
by Atul K. Ojha 12 Dec '25

12 Dec '25

Apologies for cross-posting. ---------------------------------------- *The International Conference on Spoken Language Translation* *ACL – 22nd IWSLT 2025 – **Third** Call for Participation* *31 July-1 August 2025 - Vienna, Austria* http://iwslt.org The International Conference on Spoken Language Translation (IWSLT) <https://iwslt.org/> is the premier annual conference for all aspects of Spoken Language Translation. Every year, the conference organises and sponsors open evaluation campaigns around key challenges in simultaneous and consecutive translation, under real-time/low latency or offline conditions and under low-resource or multilingual constraints. System descriptions and results from participants’ systems and scientific papers related to key algorithmic advances and best practices are presented. IWSLT is the venue of the SIGSLTs <https://iwslt.org/sigslt/>, the Special Interest Group on Spoken Language Translation <https://iwslt.org/sigslt/> of ACL <https://www.aclweb.org/portal/>, ISCA <https://www.isca-speech.org/> and ELRA <https://www.elra.info/>. With a track record of 21 years, IWSLT benchmarks and proceedings serve as reference for all researchers and practitioners working on speech translation and related fields. The 22nd edition of IWSLT will be run as a hybrid ELRA <https://www.elra.info/>/ACL <https://www.aclweb.org/portal/> event, co-located with ACL 2025 <https://2025.aclweb.org/> from 31 July to 1 August 2025. *Important Dates* *January 1, 2025*: Release of shared task training and dev data *March 15, 2025*: Scientific paper submission deadline *Apr 1-15, 2025*: Evaluation period *April 21, 2025*: System description paper and demo submission deadline *May 15, 2025*: Notification of acceptance *June 1, 2025*: Camera-ready deadline (all paper) *July 31-Aug 1*, *2025*: IWSLT conference *Evaluation* The IWSLT 2025 features shared tasks <https://iwslt.org/2025/#shared-tasks> that address the following focus areas: - High-resource ST: Offline track, Simultaneous track, Subtitling track, Model compression track - Low-resource ST: Low-resource and Indic (multilingual) tracks - Instruction-following Speech Processing track: Technical domain ST, ASR, Summarization, and QA Training and development data for each shared task will be prepared and released by the respective organisers (for further information on this initiative, please refer to the IWSLT website <https://iwslt.org/2025/>). Participants will receive instructions about how to submit their runs. In addition, participants have the opportunity to present their work through a system paper that will be published in the ACL Proceedings. *Conference* IWSLT also invites submissions of scientific papers to be published in the ACL Proceedings and presented either in oral or poster format. The conference selects high-quality, original contributions on theoretical and practical issues of spoken language translation research, technologies and applications. Submissions will be accepted directly through the IWSLT submission site (to be announced on the website <https://iwslt.org/2025/>). We will also accept commitments of submissions with reviews from the ACL Rolling Review. Additionally, to foster cross-pollination of ideas, the conference also invites the presentation of papers on speech translation recently published elsewhere. Please note that this is for non-archival presentation of papers relevant to speech translation already published in other venues (e.g., Findings for the *ACL, speech, NLP or MT conferences). Submissions for this category will be accepted through a dedicated form (to be announced on the website <https://iwslt.org/2025/>). Papers will be checked for relevance to IWSLT, and assigned either oral or poster presentation slots if selected. *Demo Session* We invite researchers, practitioners, and industry professionals to participate in an engaging demo session highlighting innovative systems, tools, and component technologies that advance the field of speech translation. The session will include live and interactive system demonstrations to foster discussion and knowledge exchange among participants across the field. For more information, please see our Call for Demos <https://iwslt.org/2025/call-for-demos>. *Contact* Please email iwslt-evaluation-campaign(a)googlegroups.com if you have any questions related to the shared tasks. Thanks, Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul (IWSLT organisers)

1 1

Advance notice: ‘Statistics for linguistics with R’ bootcamp (08 – 12/07/2024)
by Magali Paquot 01 Dec '25

01 Dec '25

The Linguistics Research Unit of the Institute of Language and Communication (Université catholique de Louvain, Belgium) will be hosting Stefan Gries’s next bootcamp on statistics for linguistics with R from 08 to 12 July 2024. The ‘Statistics for linguistics with R’ bootcamp is a hands-on introduction to statistical methods for both graduate students and seasoned researchers and is loosely based on the third edition (2021) of Gries’s textbook Statistics for linguistics with R. The course is intended for linguists who already have a basic knowledge in statistics and some experience using R and who wish to improve their proficiency in statistical modeling of linguistic data. Using the open source software and programming language R, we will deal with: • fundamental aspects of fixed effects regression modeling for both numeric and binary response variables; these include exploration of data and their preparation for modeling, model formulation and selection; numerical and visual interpretation and evaluation of models; • more advanced aspects of fixed-effects regression modeling such as contrasts for ordinal predictors, orthogonal contrasts, curvature of numeric predictors, and maybe general linear hypothesis tests; • the theoretical foundations of mixed-effects regression modeling; • applications of mixed-effects modeling for both numeric and binary response variables; • tree-based methods and random forests: 'fitting' and interpreting them with importance scores, partial dependence scores, and detecting (not just capturing) interactions. The website of the bootcamp will be online in early 2024 and online registration will start on 1 March 2024, 11 am CEST. The number of participants is limited. If you would like to participate, mark the date in your diary! Contact email: magali.paquot(a)uclouvain.be<mailto:magali.paquot@uclouvain.be> Magali Paquot Convenor

1 2

CORRECTED DATES: Call for Papers: Slav-NLP: 10th Workshop on NLP forSlavic Languages
by Roman Yangarber 17 Jul '25

17 Jul '25

**Call for Papers:* * * Slav-NLP:10thWorkshoponNLP for Slavic languages At ACL-2025, Vienna, Austria 31 July 2025 http://bsnlp.cs.helsinki.fi <http://bsnlp.cs.helsinki.fi/> Submission Deadline: 3 May ** WORKSHOPDESCRIPTION The 10th edition of the Slav-NLP Workshop — at ACL 2025. Sponsored by SIGSLAV: ACL Special Interest Group on Slavic NLP. Slavic languages play a crucial role due to their diverse cultural heritage and wide use — over 400M speakers worldwide. Current political and economic developments in Central/ Eastern Europe thrust the Slavic languages into sharp focus, especially in light of rapid technological advancements, and evolving consumer markets. Research on applied **and ***theoretical*NLP in the context of Slavic languages is still lagging. Linguistic phenomena that are common to the Slavic languages — rich morphology, free word order, etc. — make NLP for these languages challenging. Slav-NLP Workshops gather researchers from academia and industry, aiming to stimulate research in Slavic NLP, and foster the creation of tools and resources. The Workshops welcome the exchange of ideas and experience, discussing current challenges, and promoting the available resources. The structural similarity, as well as the easily recognizable core vocabulary and inflectional inventory spanning this large language group, creates a special environment where researchers can appreciate the shared problems and communicate naturally. We are happy *again *to organize Slav-NLP in Central Europe. This Workshop addresses Natural Language Processing (NLP) for the Slavic languages. NLP tasks in urgent need of attention include: * language modeling, * morphological, syntactic and semantic analysis, * lexical semantics, * named-entity recognition, * text normalization and processing non-standard language, * co-reference resolution, * information extraction, * question answering, * text summarization, * machine translation, * development of linguistic resources, * development and assessment of large language models, * text classification, * text generation, * disinformation detection, * fact verification, * sentiment analysis. The Workshop continues the proud tradition established by the 9 previous (B)SNLP Workshops. IMPORTANT DATES * Submission deadline: *3 May*2025 * Pre-reviewed ARR commitment20 May 2025 * Notification of acceptance: *1 June*2025 * Camera-ready papers due: 15 June 2025 * Workshop: 31 July 2025 ** SHARED TASK This year the Slav-NLP Workshop features — Shared Task on Detection and Classification of Persuasion Techniques— in two types of texts: (a) parliamentary debateson highly-contested topics, and (b) social media postsrelated to the spread of propaganda and disinformation. Read about the Shared Task on the Workshop’s Web page. SUBMISSION At the Workshop’s Web page: bsnlp.cs.helsinki.fi <http://bsnlp.cs.helsinki.fi/call-for-papers.html> * * Workshop Contact: bsnlp(a)cs.helsinki.fi * -- Roman Yangarber Professor, University of Helsinki, Finland Digital Humanities INEQ: Helsinki Inequality Initiative <https://helsinki.fi/en/ineq-helsinki-inequality-initiative> — Linguistic Inequalities and Translation Technologies ------------------------------------------------------------------------ e-Learning & language learning Language Learning Lab Unioninkatu 40, Metsätalo A214 helsinki.fi/revita <https://www.helsinki.fi/revita> helsinki.fi/language-learning-lab <https://www.helsinki.fi/language-learning-lab> mobile: +358 50 41 51 71 3 ------------------------------------------------------------------------ RЯ

1 1

Gaze4NLP - The First International Workshop on Gaze Data and Natural Language Processing
by Cengiz Acarturk 17 Jul '25

17 Jul '25

**First Call for Papers** Gaze4NLP - The First International Workshop on Gaze Data and Natural Language Processing September 11-13, 2025, Varna, Bulgaria (co-located with RANLP 2025) The First Workshop on Gaze Data and Natural Language Processing (Gaze4NLP), co-located with RANLP 2025 in Varna, Bulgaria, invites papers of a theoretical or experimental nature describing research methodologies by employing interdisciplinary perspectives, including computer science and engineering perspectives and cognitive sciences, and identifying challenges to resolve in the intersection of the two domains: eye tracking and NLP. Gaze4NLP aims to bring together researchers conducting research on eyes on eyes on text and NLP; and establishing bridges between them for identifying future venues of research. Workshop webpage: https://gaze4nlp.github.io/Gaze4NLP2025/about.html Important Dates Workshop paper submission deadline: 6 July 2025 Workshop paper acceptance notification: 31 July 2025 Workshop paper camera-ready versions: 30 August 2025 Workshop camera-ready proceedings ready: 8 September 2025 Workshops: 11-13 September 2025 All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”) Topics for the workshop will include, but are not limited to: - Investigating the pillars for bridging the gap between the research on eyes on text and NLP. Study how to expand research methodologies by employing interdisciplinary perspectives, including computer science and engineering perspectives and cognitive sciences, and identify challenges, issues to resolve. - Exploring new areas so that both fields benefit from each other better than the past, identifying novel domains of exploration for further research. - Discussing how to develop cognitively inspired models that align human reading data with LLMs. Submissions We solicit regular workshop papers, which will be included in the proceedings as archival publications. All categories of papers may be long (maximum 8 pages of content + up to one page for limitations (required) + unlimited references) or short (maximum 4 pages of content + up to one page for limitations (required) + unlimited references). Accepted papers will be presented in the form of either oral or poster presentations. Please note that camera-ready papers are allowed an additional page. The workshop proceedings will be part of the ACL anthology. Accepted papers will also be given an opportunity with an extended version to be published as part of an edited book. Submission link: https://softconf.com/ranlp25/Gaze4NLP2025/ Organization Committee: Dr. Cengiz Acarturk, Jagiellonian University, Poland Dr. Jamal Nasir, University of Galway, Ireland Dr. Burcu Can, University of Stirling, Scotland, UK Dr. Cagri Coltekin, University of Tubingen, Germany -- Dr. Cengiz Acarturk, Prof.UJ Centre for Cognitive Science, Jagiellonian University, Krakow On behalf of the Organization Committee

1 1

PhD position in CL/NLP at the University of Tübingen
by cagri coltekin 30 Jun '25

30 Jun '25

The DFG-funded project in the newly established Collaborative Research Center (SFB 1718) “Common ground”, hosted by the University of Tübingen, is inviting applications for a PhD position in Computational Linguistics (75%, TV-L 13 scale, approximately €3.400 per month before taxes and obligatory insurances). The position is associated with project C2 “Signaling and Interpreting Defectivity in Common Ground: Face-to-Face, Voice-Only, and Text-Only Communication” The position begins on 01 October 2025 and ends on 30 June 2029. The aim of this interdisciplinary, cross-linguistic project is to investigate the commonalities or the differences between the cues (verbal, prosodic, gestural, textual) that are employed to signal or interpret irony across different modalities in different communication environments. For the advertised PhD researcher position, this work involves written and multi-modal corpora development and analysis, and working with computational models of irony and analysis. Specifically, the PhD candidate is expected to contribute corpora preparation (collection and organizing the annotation), use machine learning approaches for irony detection, and testing for experimental and corpora data correlation. Requirements: - A master’s degree in Computational Linguistics, Computer Science or related fields. - Solid background in Machine Learning and Natural Language Processing - Experience with corpus collection and annotation - Familiarity with experimental design and statistical analysis methods - Fluency in English Not required, but desirable: - Familiarity with speech processing, or willingness to work with them - Experience with multi-modal machine learning methods - Familiarity with formal linguistics, particularly formal semantics and pragmatics To apply, please send the following application documents to Çağrı Çöltekin <cagri.coeltekin(a)uni-tuebingen.de> via email. - A short (two pages, max.) cover letter with your research interests and your motivation for applying to this position - Your academic CV, including email addresses of two referees - An academic transcript - A sample of academic writing (master’s thesis, or another publication or term paper, if the thesis is not complete at the time of application) The deadline for submitting applications is 24 July 2025 at 23:59 (Berlin time). Interviews will be conducted via Zoom on 30 and/or 31 July 2025. Candidates shortlisted for interview will be notified as soon as possible after the submission deadline. Best, Cagri

1 0

IndiREAD Workshop – 3rd Call for Papers
by Iza Skrjanec 30 Jun '25

30 Jun '25

IndiREAD Workshop 2025: 3rd Call for Papers Saarbrücken, Germany, November 26-27, 2025 IndiREAD is a workshop jointly organized by the ERC Project "Individualized Interaction in Discourse" IDDISC [1] and the MultiplEYE COST [2] action "Enabling multilingual eye-tracking data collection for human and machine language processing research". While experimental research in reading has a long tradition in identifying key factors that influence reading patterns--including text properties such as font difficulty, word and structure frequency, word predictability, and dependency length--recent studies have emphasized the importance of individual variability in reading behaviour (e.g., Haeuser & Kray, 2024; Kuperman et al., 2018; Nicenboim et al., 2016; Staub, 2021). This work has linked individual variability in reading patterns to differences in working memory capacity, reading skills, linguistic experience, and domain expertise among readers. This informs our understanding of how text characteristics and individual reader attributes interact to shape eye movements during reading. IndiREAD aims to bring together researchers interested in investigating individual differences in reading using both experimental and computational approaches. This workshop will focus on methods such as eye-tracking, self-paced reading, and the Maze task, with particular interest in how reading behaviour is correlated with individual differences. We also encourage submissions of computational models for eye movements or reading behavior that shed light on the mechanisms behind these differences. The goal is to foster collaboration between experimental and computational researchers to better understand individual variability among readers. We especially welcome submissions of reading time experiments and modelling of languages beyond English. The IndiREAD Workshop invites submissions of abstracts addressing the following questions:  * How do individual differences impact the way people read? * How do reading patterns vary across different languages, particularly in bilinguals? * How do reading patterns change across the lifespan? * Which individual difference measures are most suitable for capturing variability in reading patterns? * How can we evaluate psycholinguistic theories of reading and sentence processing across languages? * How can computational models account for individual differences in reading? * How does text adaptation influence reading patterns and comprehension among different individuals? * What statistical methods are best suited for reliably identifying latent groups and relating individual differences to reading performance? Workshop dates: November 26-27, 2025 Workshop format: The workshop will be held in-person in Saarbrücken, Germany. It will feature presentations from invited speakers, as well as contributions based on workshop submissions. The format of the presentations (oral or poster) will be determined based on the number of submissions we receive. Submission deadline: July 23, 2025. Submissions: The abstracts must not exceed 1000 words for the text (excl. captions), 10000 characters for references, and a maximum of 2 tables or figures. Abstracts should be submitted in PDF format, with 2.54 cm margins on all sides and 12 point font size, single-spaced. Please indicate up to three appropriate keywords for your abstract, which will be used for session planning. Abstracts must be written in English and should include a clear title but no information revealing the author(s). We welcome submissions for work that is being considered by other conferences, workshops, or journals. Templates for formatting in LaTeX and Word are provided on the conference website. Submission platform: https://openreview.net/group?id=IndiREAD/2025 Volunteer reviewers: We also invite all interested parties with relevant research experience to volunteer to help review abstracts for the workshop. All reviewers should hold a PhD. Please indicate your interest using the following form: https://forms.office.com/e/0fGmHW7q11 Conference website: https://www.uni-saarland.de/indiread [3] Contact email: indiread(a)lst.uni-saarland.de Travel grants: This workshop is sponsored by the MultiplEYE COST Action, which will provide financial support to cover travel expenses for a limited number of participants. Authors will be invited to apply for travel funding upon abstract acceptance. Funding may be partial, and priority will be given to junior researchers. Best, Iza Škrjanec IndiREAD Organizing Committee Links: ------ [1] https://www.uni-saarland.de/lehrstuhl/demberg/individualized-interaction-in… [2] https://multipleye.eu/ [3] https://www.uni-saarland.de/indiread

1 0

Researcher position on generative approaches to event extraction at the University of Oslo
by Erik Velldal 30 Jun '25

30 Jun '25

[Apologies for cross-posting] A postdoc-level position as Research Fellow in Natural Language Processing (NLP) is available in the Language Technology Group (LTG) at the University of Oslo (UiO), Norway. The position is for 30 months (2.5 years) and part of a research project focusing on generative approaches to event extraction in the socio-political domain. For more information about the position and the research group, please see the full announcement here: https://www.jobbnorge.no/en/available-jobs/job/283057/researcher-in-natural… The closing date is August 11th, 2025. Best regards, -erik -- Erik Velldal Language Technology Group Section for Machine Learning Department of Informatics, University of Oslo

1 0

Submission deadline Extended (4th July 2025) || KlarText Workshop @ KONVENS 2025 || German Text Simplification & Readability Assessment
by Salar Mohtaj 30 Jun '25

30 Jun '25

KlarText Workshop on German Text Simplification & Readability Assessment Co-located with KONVENS 2025 | Hildesheim, Germany | 10 September 2025 Website: https://klar-text.github.io/ ============================================================ The submission deadline for the KlarText workshop has been extended to July 4th. The workshop brings together researchers, practitioners, and industry experts to explore cutting-edge methods in German text simplification and readability assessment. We aim to raise awareness of the diverse simplification goals and language forms in German and to attract researchers tackling the unique challenges in this field. Topics of interest include (but are not limited to): - German Text Simplification - Readability Assessment - Resources & Approaches for Leichte Sprache - The Role of Large Language Models (LLMs) - Resources & Benchmarks - Evaluation & Human-Centered Assessment - Applications & Real-World Impact - Cross-Linguistic & Multilingual Perspectives Important Dates - Submission deadline (extended): July 04, 2025 - Notification of acceptance: August 1, 2025 - Camera-ready version due: August 15, 2025 - Workshop date: September 10, 2025 Submissions are managed via OpenReview (https://openreview.net/group?id=GSCL.org/KONVENS/2025/Workshop/KlarText). Organizing Committee - Salar Mohtaj, DFKI - Stefan Hillmann, Technische Universität Berlin - Sebastian Möller, Technische Universität Berlin - Georg Groh, Technische Universität München - Hadi Asghari, Technische Universität Berlin - Miriam Anschütz, Technische Universität München Contact For questions or inquiries, please contact: Salar Mohtaj – salar.mohtaj(a)dfki.de

1 0

2026

2025

2024

2023

2022

Corpora June 2025