- Corpora - ELRA lists

PhD in NLP at Inria,France
by Oana Balalau 29 Jun '26

29 Jun '26

Hello, We are looking for a driven PhD candidate to work on NLP tools that help us change how we measure and mitigate bias in the media . Instead of f ocusing solely on factual accuracy, this project shifts the paradigm to narrative and argumentative diversity . Please share the job offering with potential candidates. Applications should only be made via the website below. The PhD student will be part of CEDAR at Inria Saclay, in the Paris area. [ https://lnkd.in/exXntevw ] [ https://recrutement.inria.fr/public/classic/en/offres/2026-10222 | https://recrutement.inria.fr/public/classic/en/offres/2026-10222 ] Best regards, Oana Balalau

1 0

Full professorship in contemporary English linguistics (W2) with a cognitive focus at RWTH Aachen University, Germany
by Neumann, Stella 29 Jun '26

29 Jun '26

+++ Apologies for cross-postings +++ RWTH Aachen University is one of Germanys pre-eminent Universities of Excellence and is renowned for its high-quality teaching and world-class research. Assuming profound responsibility towards society, RWTH addresses bold scientific questions and translates its knowledge into meaningful applications. RWTH strives for the convergence of knowledge, methods, and findings and integrates in-depth disciplinary knowledge within the interdisciplinary research consortia established in its eight profile areas. The Universitys dynamic, creative, and international environment encompasses efficient research networks and institutionalized collaborations. With its innovative RWTH Aachen Campus project, the University boasts one of the largest technology-oriented research environments in Europe. Full Professor (W2) Contemporary English Linguistics at the Faculty of Arts and Humanities We are inviting applications for the position of full professor in the area of contemporary English linguistics, to be filled as soon as possible. The successful candidate will have a demonstrated research and teaching focus on the empirical study of the contemporary English language based on approaches from the spectrum of cognitive theories (e.g. cognitive linguistics, cognitive grammar, construction grammar, usage-based approaches). One focus of their work should be in a subfield of linguistics, with multimodality and/or semantics/pragmatics being particularly welcome. The successful candidate will teach in all study programs offered by the Department of English Studies at the bachelor's and master's level and have teaching experience ideally in more than one subfield of English linguistics. In addition, a demonstrable interest in collaborating with literary studies and participating in planned interdisciplinary research networks is expected. Experience in acquiring third-party funding is desirable. The Department of English Studies at RWTH Aachen is particularly committed to high-quality teaching and expects the successful candidate to have a corresponding profile and commitment. As participation in academic self-administration is expected, evidence of previous involvement in this area is desirable. The requirements include a doctoral degree and additional research experience, such as a habilitation (post-doctoral lecturing qualification) or equivalent achievements gained as a university researcher or professor or in a research position outside academia. Teaching ability and dedication are essential and the application should include proof of this. Fluent German is not required to start the position but the successful candidate will be expected to hold classes in German within the first 5 years. Link to the official job ad: https://www.rwth-aachen.de/global/show_document.asp?id=aaaaaaaadloiyeb Prof. Dr. Stella Neumann Anglistische Sprachwissenschaft RWTH Aachen University Institut für Anglistik Zi. 101 Kármánstr. 17/19 D-52062 Aachen Tel. +49 (0)241 80-96105

1 0

Final Call for Papers: Readability and Text Simplification - scientific and industry workshop CLEAR-TEXT 2026
by Irina Temnikova 29 Jun '26

29 Jun '26

Final Call For Papers: CLEAR-TEXT Workshop at the conference CLIB 2026 (7 September 2026, Sofia, Bulgaria) Website: https://sites.google.com/view/clear-text/home FREE PARTICIPATION IMPORTANT DATES: 30 June 2026 (23:59 anywhere on Earth) all articles submission deadline 28 July 2026 (23:59 anywhere on Earth) review decisions sent to authors 18 August 2026 (23:59 anywhere on Earth) camera-ready submissions due 7 September, 2026 - Workshop date This is an interdisciplinary workshop which addresses all topics ranging from Legibility to Text Simplification, including issues, resources, and manual and automatic methods related to measuring and improving text comprehensibility. The event brings together researchers, IT professionals, publishers, public institutions (e.g., Ministries of Education, schools), and practitioners (authors, teachers, translators) to discuss current challenges and innovative solutions. Specifically, we invite submissions from scientific researchers, representatives of IT companies, publishers, public institutions (such as the Ministries of Education and public schools), and practitioners (such as authors of children's books, school materials, and teachers). The scientific articles can present completed or ongoing work with methods coming from all related fields, such as Linguistics, Translation and Translation Technologies, Psycholinguistics, Education, Computational Social Science, and Natural Language Processing. Accepted papers will be published in the ACL Anthology. The workshop will feature one renowned invited speaker (Prof. Ruslan Mitkov), an NLP panel with well-known experts, such as Prof. Ruslan Mitkov, Prof. Dimitar Kazakov, Prof. Constantin Orasan, Dr Matthew Shardlow and a panel with Bulgarian industry and public institutions representatives. The aim of the panels will be to discuss the current state-of-the-art, existing public issues and possible solutions to them. More information: https://sites.google.com/view/clear-text/home Irina Temnikova, PhD Big Data for Smart Society Institute (GATE)

1 0

COLING 2027 Tutorials: Call for Expressions of Interest
by leonie.weissweiler＠uni-leipzig.de 29 Jun '26

29 Jun '26

The tutorial chairs of COLING 2027 invite expressions of interest for tutorials in conjunction with the conference in Macau in May 2027. We seek tutorials in all areas of computational linguistics (CL) / natural language processing (NLP), broadly conceived to include related disciplines. We invite expressions of interest for three types of tutorials: * Cutting-edge: Tutorials that cover advances in newly emerging areas not recently covered in a tutorial at COLING, LREC, EACL, NAACL-HLT, ACL, AACL, EMNLP or related venue. * Introductory: Tutorials that provide introductions to related fields that are potentially relevant for the CL / NLP community (e.g., linguistics, bioinformatics, machine learning techniques, human-computer interaction, large language models for non-English languages, etc.). * Community: Tutorials related to topics relevant for members of the community more generally (e.g. reviewing, good scientific practice, statistical testing, mental health, career planning, etc.). In all cases, the aim of a tutorial is primarily to help CL/NLP researchers to understand a scientific problem, its tractability, and its theoretical and practical implications. Presentations of particular technological solutions or systems are welcome, provided that they serve as illustrations of broader scientific considerations. None of the tutorial types are expected to be "self-invited" long talks. The content should be a good balance between research from multiple groups and perspectives, not only from the teachers of the tutorial. Tutorials will be held at COLING between May 9 and May 14, 2027. The specific tutorial days will be confirmed at a later date, together with a full call for proposals. Expressions of interested are intended to help us plan the review process, and as such are highly encouraged, but not mandatory. Expressions of interest should be submitted by July 24, 2026, via https://forms.gle/GzgjseZpHVmqH75A6 , or as an email to coling27tutorialchairs(a)googlegroups.com, and contain: * Name and email address of primary presenter * Proposed title of tutorial * Type of tutorial * Optional: anything you would like to tell or ask us

1 0

*Deadline Update* Last Call for Abstracts and Tutorials – 2nd Computational Psycholinguistics Meeting CPL 2026
by Takmaz, E.K. (Ece) 27 Jun '26

27 Jun '26

Computational Psycholinguistics Meeting Conference website: https://cpl2026.sites.uu.nl/ We are excited to announce the 2nd edition of a new recurring workshop dedicated exclusively to computational psycholinguistics. The field has seen significant growth in recent years, not only due to developments in large language models but also to advances in symbolic processing models, Bayesian approaches, mechanistic models, and frameworks such as ACT-R. These diverse models aim to capture various aspects of human language processing, including semantics, syntax, sentence comprehension, speech, and more. This meeting aims to provide a dedicated platform for researchers and practitioners to discuss computational models that explain and predict human linguistic behavior (e.g., as observed in psycholinguistic experiments), to bring together experts from different subfields to advance our understanding of language processing mechanisms, and to analyze the successes and limitations of different modeling approaches. The meeting will cover a range of topics, including but not limited to: - Exploring how models like (large) language models, symbolic models, Bayesian models, connectionist models, and ACT-R based models can explain and predict human behavior in language tasks. - Analyzing where different types of models succeed or fall short in capturing human language processing. - Investigating what linguistic information should be integrated across different levels (words, sentences, discourse) and how this affects comprehension and production - Examining the potential of models that combine neural and symbolic approaches to better mimic human language processing. - Applying computational, algorithmic, and implementational levels of analysis to understand language processing mechanisms. - Focusing on recent developments in computational modeling of semantics, syntax, sentence processing, speech perception and production. The abstract submission deadline is July 17. The abstracts can be submitted at https://openreview.net/group?id=UU.nl/Utrecht_University/2026/CPL Next to the conference abstracts, we are also looking for abstracts for tutorials, which could cover any topic related to computational psycholinguistics and should be planned to run for between 1 and 3 hours. The tutorial abstracts will have the same submission deadline as the main conference, July 17. The conference will take place in Utrecht, the Netherlands, on December 3 and December 4, and it will be preceded by tutorials on the afternoon of December 2. Best regards, CPL Organizing Committee Jakub Dotlacil (Utrecht University) Li Kloostra (Utrecht University) Philine Link (Utrecht University) Ece Takmaz (Utrecht University) Giovanni Cassani (Tilburg University) Bruno Nicenboim (Tilburg University) Lena Jäger (University of Zurich)

1 0

Karen Spärck Jones Award call
by ACL Announcements 27 Jun '26

27 Jun '26

~ An award to commemorate Karen Spärck Jones ~ A pioneer of information retrieval, the computer science sub-discipline that also underpins the technology of modern Web search engines, Karen Spärck Jones was a British professor of Computers and Information at the University of Cambridge in Cambridge. Her contributions to the fields of Natural Language Processing (NLP) and Information Retrieval (IR), especially with regard to experimentation, have been outstanding, highly influential and lasting, and include the introduction of Inverse Document Frequency for relevance ranking. Her achievements resulted in her receiving a number of prestigious accolades such as the BCS Lovelace medal for her advancement in Information Systems, and the ACM Salton Award for her significant, sustained and continuing contributions to research in information retrieval. Karen was also an outspoken advocate for women in computing, and we encourage former advisors of talented scientists to provide the judges with a rich and diverse candidate pool to select from. In order to honour Karen's achievements, the BCS Information Retrieval Specialist Group (BCS IRSG) in conjunction with the BCS has established an annual award to encourage and promote talented researchers who have endeavoured to advance our understanding of Natural Language Processing (NLP) or Information Retrieval (IR) with significant experimental contributions. To celebrate the commemorative event, the recipient of the award is invited to present a keynote lecture at BCS IRSG's annual conference: the European Conference on Information Retrieval (ECIR), or at BCS IRSG's partner conference: the Conference of the European Chapter of the Association for Computational Linguistics (EACL). These forums provide excellent venues to present and announce the award because the conferences attract many new and young IR and NLP researchers. The recipient of the 2026 award will give a keynote lecture at ECIR 2027 in Southampton, UK, March 2027. To learn more about the award and how to submit a nomination, visit: https://www.bcs.org/membership-and-registrations/member-communities/informa… [1] https://www.bcs.org/membership-and-registrations/member-communities/informa… [2] Timeline for the 2026 Award: * 04 September 2026 - Closing date for nominations * 11 September 2026 - Deadline for support letters * 11 December 2026 - Notification of the prize recipient * March 2027 - Winner presents keynote at ECIR 2027. BCS IRSG would like to thank Bloomberg for their generous sponsorship of the KSJ Award in recent years. For a list of previous recipients of the award, visit: https://www.bcs.org/membership-and-registrations/member-communities/informa… [3] If you have any questions, please contact Dr Graham McDonald: graham.mcdonald(a)glasgow.ac.uk Graham McDonald KSJ Award Panel Chair of the BCS IRSG Committee Links: ------ [1] https://www.bcs.org/membership-and-registrations/member-communities/informa… [2] https://www.bcs.org/membership-and-registrations/member-communities/informa… [3] https://www.bcs.org/membership-and-registrations/member-communities/informa…

1 0

*Deadline Update* Last CfP for EMNLP Workshop on Multimodal Interaction in Face-to-Face Dialogue (MINT)
by Takmaz, E.K. (Ece) 26 Jun '26

26 Jun '26

*Deadline Update* Last CfP for EMNLP Workshop on Multimodal Interaction in Face-to-Face Dialogue (MINT) We invite submissions to MINT: Multimodal Interaction in Face-to-Face Dialogue, a workshop that brings together researchers from computational linguistics, NLP, computer vision, HCI, robotics, and cognitive science working on multimodal face-to-face communication. Workshop website: https://mintworkshop.github.io/2026/ The Workshop will be co-located with EMNLP 2026 in Budapest, Hungary, October 24–29, 2026 (exact date within this period to be decided). We welcome work on topics including: - computational models that integrate verbal and non-verbal cues such as speech, text, gesture, facial expression, gaze, and body pose; - cognitive and linguistic insights about face-to-face communication that can inform AI systems; - multimodal datasets with synchronized speech, video, and motion data; - evaluation methods for multimodal interaction; - applications and tools for embodied conversational agents, social robots, annotation, and behavioural analysis. Papers should be prepared using the official ACL formatting guidelines and ACL style files. MINT welcomes both archival and non-archival papers: - Archival papers: Submissions must be anonymous and report original, unpublished research to appear in the workshop proceedings. - Non-archival papers: Submissions reporting previously published work, preliminary research, or demos to be presented at the workshop and not published in the MINT proceedings. Papers may be submitted as long papers (up to 8 pages plus references) or short papers (up to 4 pages plus references). Non-archival submissions do not need to be anonymous. We allow cross-submissions to other venues. However, to be included in the proceedings, authors of accepted papers must withdraw them from any other venue where they remain under consideration. MINT will accept submissions through two channels: 1. Direct submission: The dedicated OpenReview portal for this is available at https://openreview.net/group?id=EMNLP/2026/Workshop/MINT. Archival papers submitted through this channel will be reviewed by the MINT programme committee. 2. ACL Rolling Review (ARR): Authors may submit through ARR and commit their paper together with the ARR reviews to MINT later at https://openreview.net/group?id=EMNLP/2026/Workshop/MINT_ARR_Commitment **Important dates (11:59 pm AOE)** - ARR paper submission deadline: May 25, 2026 - Direct paper submission deadline: July 8, 2026 July 22, 2026 - Pre-reviewed ARR commitment deadline: August 24, 2026 - Notification of acceptance: August 31, 2026 - Camera-ready paper due: September 14, 2026 Accepted contributions will be required to be presented at the MINT workshop as posters or talks. The MINT workshop is sponsored by the Max Planck Institute for Psycholinguistics: https://www.mpi.nl/ For questions, please contact: mint.organizers(a)gmail.com On behalf of the workshop organisers: - Raquel Fernández (University of Amsterdam) - Diego Frassinelli (LMU Munich) - Esam Ghaleb (Max Planck Institute for Psycholinguistics) - Bulat Khaertdinov (Maastricht University) - Asli Ozyurek (Max Planck Institute for Psycholinguistics / Radboud University) - Ece Takmaz (Utrecht University) - Zerrin Yumak (Utrecht University)

1 0

2nd Call for Participation: Daleel 2026 - Arabic Argumentative Discourse Mining Shared Task
by Nabhani, S. 26 Jun '26

26 Jun '26

*Daleel 2026 - Arabic Argumentative Discourse Mining Shared Task* *ArabicNLP Conference, co-located with EMNLP 2026* Dear All, Registration is now open for *Daleel 2026: Arabic Argumentative Discourse Mining Shared Task*, which will be held as part of the ArabicNLP 2026 Conference, co-located with EMNLP 2026 in Budapest, Hungary. The shared task focuses on *the detection and classification of argumentative discourse units in Arabic* across two text genres: Editorials and Debates. *Tasks* - Task 1: Argumentative Discourse Unit Classification - Task 2: Argumentative Discourse Unit Detection *Awards* Monetary prizes will be awarded to the authors of the top three system description papers: - 1st Place: $400 - 2nd Place: $200 - 3rd Place: $150 *Important Dates* - June 5, 2026: Release of data, baselines, and evaluation scripts - July 25, 2026: Registration deadline and release of final evaluation input data - July 30, 2026: System submission deadline and final evaluation - August 6, 2026: System description paper submission deadline - August 13, 2026: Notification of acceptance - August 22, 2026: Camera-ready submission of system papers *For detailed information about the shared task and registration, please visit:* https://qatardebate.org/programs/academic-programs/daleel2026-shared-task/ *For questions, please contact:* daleel26(a)argsbase.net We look forward to your participation and to advancing research in Arabic argument mining together. *Organizers* - Sara Nabhani, University of Groningen - Nahla Bassyouni, QatarDebate - Ali Al-Zawqari, Vrije Universiteit Brussel - Mohammad Khader, QatarDebate - Khalid Al-Khatib, University of Groningen

1 0

Question about OPUS Books v1 alignment process
by Hugo Sanjurjo González 26 Jun '26

26 Jun '26

Dear Corpora List members, I am using the Books v1 corpus from OPUS <https://opus.nlpl.eu/datasets/Books?pair=en&es> as part of my research and have a question regarding the alignment process. The corpus description mentions that some texts were manually reviewed by András Farkas, but it is not entirely clear whether this review concerned sentence-level alignments, paragraph-level alignments, or both. Specifically, I would like to know whether the sentence-level alignments (including 1-to-1, 1-to-2, 2-to-1, and unmatched sentences) can be considered manually verified gold-standard data, a partially reviewed silver standard, or fully automatic alignments without human validation. I would be very grateful if someone could provide any clarification or pointers to relevant documentation. If this is not the most appropriate forum for this question, I apologize in advance, but I thought someone here might be able to help. Thank you very much for your time. Hugo

1 0

Webinar by: Marco Valentino (University of Sheffield)
by HiTZ zentroa 26 Jun '26

26 Jun '26

**** We apologize for the multiple copies of this email. In case you are already registered to the next webinar, you do not need to register again. *** Dear colleague, We are happy to announce the next webinar in the Language Technology webinar series organized by The HiTZ Chair of Artificial Intelligence and Language Technology (https://hitz.eus/katedra). We are organizing one seminar every month. This will be the final event for this cycle, but we will return with a new series next October. Have a great summer! Speaker: Marco Valentino (University of Sheffield) Title: Reconciling Plausible and Formal Reasoning in Large Language Models Date: Thursday, July 2, 2026 - 15:00 CET Summary: A persistent challenge in AI is the effective integration of plausible and formal reasoning - the former concerning the plausibility and contextual relevance of arguments, the latter focusing on their logical and structural validity. Large Language Models (LLMs) are not immune to such a challenge. By virtue of their extensive pre-training, LLMs can generate plausible and linguistically fluent arguments, but struggle with the systematicity and consistency required for robust logical reasoning. At the same time, LLMs offer new opportunities to study and overcome this intrinsic conflict. This talk will focus on such opportunities, presenting different research directions aimed at reconciling plausible and formal reasoning, including LLM-driven neuro-symbolic integration, quasi-symbolic abstractions, and latent circuit disentanglement. The final part of the talk will discuss the persisting challenges in achieving truly unified reasoning and outline possible directions for future research in the field. Bio: Marco is a Lecturer in Artificial Intelligence and Applications of AI in the Natural Language Processing (NLP) group at the University of Sheffield. Prior to Sheffield, he was a member of the Neuro-Symbolic AI Group at the Idiap Research Institute in Switzerland, and obtained a PhD in Computer Science from the University of Manchester. His research focuses on developing AI systems that can use explanation as a core mechanism for learning and reasoning, investigating the integration of neural and symbolic AI methods. Moreover, he is interested in developing methodologies to interpret, control, and evaluate Large Language Models (LLMs), with a focus on disentangling knowledge acquisition from abstract logical reasoning, and enabling out-of-distribution, out-of-domain generalisation. Registration: https://www.hitz.eus/webinar_izenematea You can view the videos of previous webinars and the schedule for upcoming webinars here: http://www.hitz.eus/webinars If you cannot attend this seminar, but you want to be informed of the following HiTZ webinars, please complete this registration form instead: http://www.hitz.eus/webinar_info Best wishes, The HiTZ Chair of Artificial Intelligence and Language Technology P.S: HiTZ will not grant any type of certificate for attendance at these webinars.

1 0

2026

2025

2024

2023

2022

Corpora