July 2025 - Corpora - ELRA lists

CfP: 1st Workshop on Human-LLM Collaboration for Ethical and Responsible Science Production (SciProdLLM 2025)
by Tristan Miller 31 Oct '25

31 Oct '25

SciProdLLM 2025: 1st Workshop on Human-LLM Collaboration for Ethical and Responsible Science Production December 23/24, 2025, Mumbai, India Co-located with IJCNLP-AACL 2025 https://sciprodllm.github.io/ CALL FOR PAPERS SciProdLLM 2025 is a forum for presenting and discussing research on integrating large language models (LLMs) into the typical research workflow: from ideation to experimentation to scientific writing, with a particular focus on human-centered approaches that ensure ethical and responsible use of LLMs. We also invite work that evaluates the quality of LLM-assisted research workflows and the resulting outputs. We welcome submissions on any aspect of human-LLM collaboration for science production, evaluation of LLMs for science production, and/or evaluation of LLM-assisted scientific papers. Relevant topics include, but are not limited to, the following: - Guiding idea generation through user feedback - Automated experimentation following the experimental workflow used by human scientists (e.g., the workflow from data preprocessing to comparison to baselines) - Human-curated datasets of scientific papers for fine-tuning LLMs for generating ideas and paper content (text, figures, tables, etc.) - Human-LLM co-authored peer reviews (e.g., LLM-assisted peer review platforms) - Benchmark datasets for evaluating LLMs on idea generation, experimentation, multimodal content generation, or scientific writing - Evaluation metrics for detecting problematic papers (e.g., those containing suspicious citations or tortured phrases) - Statistical analyses of collections of LLM-assisted papers (e.g., on topics, citations, or retractions) SUBMISSION INSTRUCTIONS SciProdLLM 2025 welcomes long and short papers. Long papers may consist of up to 8 pages of content, plus unlimited pages of references. Short papers may consist of up to 4 pages of content, plus unlimited pages of references. Both types of submissions must follow the same requirements and procedures as for IJCNLP-AACL 2025 main conference papers: <https://2025.aaclnet.org/calls/main_conference_papers> Note that papers submitted as non-archival will be allocated presentation time at the workshop but will not be included in the proceedings. There are three supported submission modes: - Direct submissions: Direct submissions will receive up to three double-blind reviews, and a final decision on acceptance from the workshop organizers. Direct submissions should be made through the SciProdLLM page on OpenReview: <https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/SciPro…> - ARR submissions: Unpublished papers that have already been reviewed and meta-reviewed through ACL Rolling Review may be committed to SciProdLLM. These papers will not receive new reviews but may be meta-reviewed by the workshop organizers, who will make a final decision on acceptance. A commitment link will published in a future version of this call. - Previously published papers: We invite non-archival submissions of papers that have already been recently published elsewhere. This allows such papers to gain more visibility from the workshop audience. To submit a previously published paper for presentation, please email SciProdLLM(a)groups.io with the details of your paper (title, authors, abstract, publication venue) and attaching a PDF copy of the paper. (Submissions of previously published papers need not adhere to the IJCNLP-AACL 2025 main conference paper policies on anonymity.) IMPORTANT DATES All deadlines are 23:59 UTC−12 ("Anywhere on Earth"). - September 29, 2025: Submission deadline for direct submissions - October 27, 2025: ARR commitment deadline - November 3, 2025: Notification of acceptance - November 11, 2025: Camera-ready papers due - December 23 or 24, 2025: Workshop presentations (exact date TBA) ORGANIZING COMMITTEE - Wei Zhao, University of Aberdeen, UK - Jennifer D’Souza, TIB Leibniz Information Center for Science and Technology, Germany - Steffen Eger, University of Technology Nuremberg, Germany - Anne Lauscher, University of Hamburg, Germany - Yufang Hou, IT:U Interdisciplinary Transformation University, Austria - Nafise Sadat Moosavi, University of Sheffield, UK - Tristan Miller, University of Manitoba, Canada - Chenghua Lin, University of Manchester, UK CONTACT INFORMATION Email: SciProdLLM(a)groups.io WWW: https://sciprodllm.github.io/ -- Dr. Tristan Miller, Assistant Professor Department of Computer Science, University of Manitoba https://clam.cs.umanitoba.ca/ | Tel. +1 204 474 6792

1 3

Advance notice: ‘Statistics for linguistics with R’ bootcamp (08 – 12/07/2024)
by Magali Paquot 14 Oct '25

14 Oct '25

The Linguistics Research Unit of the Institute of Language and Communication (Université catholique de Louvain, Belgium) will be hosting Stefan Gries’s next bootcamp on statistics for linguistics with R from 08 to 12 July 2024. The ‘Statistics for linguistics with R’ bootcamp is a hands-on introduction to statistical methods for both graduate students and seasoned researchers and is loosely based on the third edition (2021) of Gries’s textbook Statistics for linguistics with R. The course is intended for linguists who already have a basic knowledge in statistics and some experience using R and who wish to improve their proficiency in statistical modeling of linguistic data. Using the open source software and programming language R, we will deal with: • fundamental aspects of fixed effects regression modeling for both numeric and binary response variables; these include exploration of data and their preparation for modeling, model formulation and selection; numerical and visual interpretation and evaluation of models; • more advanced aspects of fixed-effects regression modeling such as contrasts for ordinal predictors, orthogonal contrasts, curvature of numeric predictors, and maybe general linear hypothesis tests; • the theoretical foundations of mixed-effects regression modeling; • applications of mixed-effects modeling for both numeric and binary response variables; • tree-based methods and random forests: 'fitting' and interpreting them with importance scores, partial dependence scores, and detecting (not just capturing) interactions. The website of the bootcamp will be online in early 2024 and online registration will start on 1 March 2024, 11 am CEST. The number of participants is limited. If you would like to participate, mark the date in your diary! Contact email: magali.paquot(a)uclouvain.be<mailto:magali.paquot@uclouvain.be> Magali Paquot Convenor

1 1

[CfP] CHOMPS Workshop colocated with IJCNLP-AACL 2025
by Aman Sinha 24 Aug '25

24 Aug '25

First CFP: CHOMPS – Confabulation, Hallucinations, & Overgeneration in Multilingual & Precision-critical Settings (with our apologies for cross-posting) Venue: IJCNLP-AACL 2025 (https://2025.aaclnet.org/), Mumbai, India Date: 23/24th December 2025 (TBC) Workshop website: https://chomps2025.github.io/ * Description * Despite rapid advances, LLMs continue to "make things up": a phenomenon that manifests as hallucination, confabulation, and overgeneration. That is, produce unsupported and unverifiable text that sounds deceptively plausible. These outputs pose real risks in settings where accuracy and accountability are non-negotiable, including healthcare, legal systems, and education. The aim of the CHOMPS workshop is to find ways to mitigate one of major the hurdles that currently prevent the adoption of Large Language Models in real-world scenarios: namely, their tendency to hallucinate, i.e., produce unsupported and unverifiable text that sounds deceptively plausible. The workshop will explore hallucination mitigation in practical situations, where this mitigation is crucial: in particular, precision-critical applications (such as those in the medical, legal and biotech domains), as well as multilingual settings (given the lack of resources available to reproduce what can be done for English in other linguistic contexts). In practice, we intend to invite works of the following (not exclusive) list of topics: * Workshop topics * - Metrics, benchmarks and tools for hallucination detection - Factuality challenges in mission critical & domain-specific (e.g., medical, legal, biotech) and their consequences - Mitigation strategies during inference or model training - Studies of hallucinatory and confabulatory behaviors of LLMS in cross-lingual and multilingual scenarios - Confabulations in language & multimodal (vision, text, speech) models - Perspectives and case studies from other disciplines - … * Invited speakers * - Anna ROGERS, IT University of Copenhagen - Danish PRUTHI, IISc Bangalore - Abhilasha RAVICHANDER, University of Washington * Submission details * The workshop is designed with a widely inclusive submission policy so as to foster as vibrant a discussion as possible. Archival or non-archival submissions may consist of up to 8 pages (long) or 4 pages (short) of content. Dissemination submissions may consist of up to 1 pages of content. On acceptance, authors may add one additional page to accommodate changes suggested by the reviewers. Please use the ACL style templates available here: https://github.com/acl-org/acl-style-files The submissions need to be done in PDF format via (a) via Direct submission (https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/CHOMPS) (b) via ARR commitment (https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/CHOMPS…) * Important dates * Paper submission deadline: September 29, 2025 Direct ARR commitment: October 27, 2025 Author notification: November 3, 2025 Camera-Ready due: November 11, 2025 Workshop date: December 23-24, 2025 (TBC) * Contact * For questions, please send an email to chomps-aacl2025(a)googlegroups.com or contact one of the workshop chairs: - Aman Sinha, Université de Lorraine, aman.sinha(a)univ-lorraine.fr - Raúl Vázquez, University of Helsinki, raul.vazquez(a)helsinki.fi - Timothee Mickus, University of Helsinki, timothee.mickus(a)helsinki.fi

1 1

LREC 2026 2nd Call for Tutorial Proposals
by info＠elda.org 31 Jul '25

31 Jul '25

[Apologies for multiple postings] The 15th edition of the Language Resources and Evaluation Conference (LREC 2026) invites proposals for tutorials to be held in conjunction with the conference. We seek proposals in all areas of natural language processing and computation, language resources (LRs) and evaluation, including spoken language, sign language, and multimodal interaction. The tutorials will be held at LREC 2026 in Palma de Mallorca (Spain), on 11, 12, or 16 May 2026. *IMPORTANT DATES* 17 October 2025: Proposal submission due 17 November 2025: Notification of acceptance 11-16 May 2026: LREC 2026 conference *Cutting-edge:* tutorials that cover advances in newly emerging areas. The tutorials are expected to give a brief introduction to the topic, but participants are assumed to have some prior knowledge of the topic. The focus of the class will be on discussing the most recent developments in the field, and it will spend a considerable amount of time pointing out open research questions and important novel research directions. *Introductory to computational linguistics (CL)/ natural language processing (NLP) topics*: tutorials that provide introductions to topics that are established in the LREC communities. The lecturers provide an overview of the development of the field from the beginning until now. Attendees are not expected to come with prior knowledge. They acquire sufficient understanding of the topic to understand the most recent research in the field. *Introductory to adjacent areas:* tutorials that provide introductions to topics that are established or emerging in areas adjacent to CL/NLP. The lecturers provide an overview of the development of the field from the beginning until now. Attendees are not expected to come with prior knowledge. They acquire a sufficient understanding of the topic to understand the most recent research in the field and its relevance for the CL/NLP domains. In all cases, the aim of a tutorial is primarily to help understand a scientific problem, its tractability, and its theoretical and practical implications. Presentations of particular technological solutions or systems are welcome, provided that they serve as illustrations of broader scientific considerations. None of the tutorial types are expected to be “self-invited” long talks – the content should be a good balance between research from multiple groups and perspectives, not only of the teachers of the tutorial. Proposals should be prepared according to the style files available at https://www.overleaf.com/read/mgtcgxgmrhvz#1b1392, available also from the LREC website (https://lrec2026.info/). Proposals should not exceed 4 pages of content (plus unlimited pages for references), and they should be submitted as PDF documents. Tutorial proposals do not have to be anonymized. They should contain: * A title that helps potential attendees to understand what the tutorial will be about. * An abstract that summarizes the topics, goals, target audience, and type (see above) of the tutorial (this abstract will also be on the LREC website). * A section called “Introduction” that explains the topic and summarizes the starting point and relevance for our community, and in general. * A section called “Target Audience” that explains for whom the tutorial will be developed and what the expected prior knowledge is. Clearly specify what attendees should know and be able to practically do to get the most out of your tutorial. Examples of what to specify include prior mathematical knowledge, knowledge of specific modeling approaches and methods, programming skills, or adjacent areas like computer vision. Also specify the number of expected participants. * A section called “Outline” in which the various topics are explained. This can be a list of bullet points or a set of paragraphs explaining the content. Explain what you intend and how long the tutorial will be. * A section called “Diversity Considerations”, discussing each of the three aspects of diversity mentioned above or others. * A section called “Reading List”: What are introductory papers or books that potential attendees can read to get a first impression of the tutorial content? What do you expect them to have read before attending? What does provide further information beyond the content of the tutorial? * A section called “Presenters” in which each tutorial presenter is briefly introduced in one paragraph, including their research interests, their areas of expertise for the tutorial topic, and their experience in teaching a diverse and international audience. * A section called “Other Information” which should include information on how many people are expected to participate and how you came to this estimate. You can also explain any other aspects that you find important, including special equipment that you would need. * A section called “Ethics Statement” which discusses ethical considerations related to the topics of the tutorial. Tutorials can be half-day (morning 9:00 to 13:00 or afternoon 14:00 to 18:00) or full-day (9:00 to 18:00) and must follow fixed hours for breaks (morning coffee break 10.30-11.00, lunch break: 13:00-14:00, afternoon coffee break: 16.00-16.30). Submission is electronic. Please submit the proposals using the START system at this URL: https://softconf.com/lrec2026/tutorials *EVALUATION CRITERIA* The tutorial proposals will be evaluated according to their originality and impact, the expected interest level of participants, as well as the quality of the organizing team and Program Committee and their contribution to the diversity of the conference. *DIVERSITY AND INCLUSION* We particularly encourage submissions from underrepresented groups in computational linguistics, researchers from any demographic or geographic minority, with disabilities, or others. In the evaluation of the proposal, we will take these aspects into account to create a varied and balanced set of tutorials. This includes several aspects of diversity, namely (1) how the topic of the tutorial contributes to improved diversity and increased fairness in the field, (2) if the topic is particularly relevant for a specific underrepresented group of potential participants, and (3) if the presenters are from an underrepresented group. *INSTRUCTOR RESPONSIBILITIES* Accepted tutorial presenters will be notified by the date mentioned above. They must then provide abstracts of their tutorials for inclusion in the conference registration material by the specific deadlines. The abstract needs to be provided in ASCII format. The summary will be submitted in PDF format and can be updated from the version submitted for review. The instructors will make their material available in an appropriate way, for instance, by setting up a website. They will be invited to submit their slides to the ACL Anthology. Finally, at least one tutorial presenter must attend the event in person to organise the tutorial. *CONTACT* Tutorial Chairs: lrec2026-tutorial-chairs(a)googlegroups.com General contact: mailto:info@lrec2026.info <mailto:info@lrec2026.info> More information on LREC 2026: https://lrec2026.info/ <https://lrec2026.info/ target=> ------ *LREC 2026 Second Calls for Papers & Proposals are available. * *Deadline: October 17, 2025 * * Main conference 2nd CfP: https://lrec2026.info/calls/se... <https://click.mailerlite.com/link/c/> * Workshops 2nd CfP: https://lrec2026.info/second-c... <https://click.mailerlite.com/link/c/> * Tutorials 2nd CfP: https://lrec2026.info/second-c... <https://click.mailerlite.com/link/c/> Authors' Kit: https://lrec2026.info/authors-kit/ General contact:mailto:info@lrec2026.info <mailto:info@lrec2026.info>

1 0

[CfP] CHOMPS Shared Task: SHROOM-CAP , the Shared-task on Hallucinations and Related Observable Overgeneration Mistakes in Crosslingual Analyses of Publications
by Aman Sinha 31 Jul '25

31 Jul '25

TL;DR [ https://helsinki-nlp.github.io/shroom/2025a | SHROOM-CAP ] is a Indic-centric shared task colocated with [ https://chomps2025.github.io/ | CHOMPS-2025 ] to advance the SOTA in hallucination detection for scientific content generated with LLMs. We’ve annotated hallucinated content in 4 different high resource languages and surprisal 3* low resource indic languages from top -tier LLMs. Participate in as many languages as you’d like by accurately detecting presence of hallucinated content. Stay informed by joining our [ https://groups.google.com/g/shroomcap | Google group ] ! Full Invitation We are excited to announce the SHROOM-CAP shared task on cross-lingual hallucination detection for scientific publication (link to [ https://helsinki-nlp.github.io/shroom/2025a | website ] ). We invite participants to detect whether or not there is hallucination in the outputs of instruction-tuned LLMs in a cross-lingual scientific context. About This shared task builds upon our previous iteration, [ https://helsinki-nlp.github.io/shroom/2024 | SHROOM ] , with three key highlights: LLM-centered, cross-lingual annotations & hallucination and fluency prediction. LLMs frequently produce "hallucinations," where models generate plausible but incorrect outputs, while the existing metrics prioritize fluency over correctness. This results in an issue of growing concern as these models are increasingly adopted by the public. With SHROOM-CAP , we want to advance the state-of-the-art in detecting hallucinated scientific content. This new iteration of the shared task is held in a cross-lingual and multimodel context: we provide data produced by a variety of open-weights LLMs in 4+3* different high and low resource languages (English, French, Spanish, Hindi, and to-be-later-revealed indic languages). Participants are invited to participate in any of the languages available and are expected to develop systems that can accurately identify hallucinations in generated scientific content. Additionally, participants will also be invited to submit system description papers, with the option to present them in oral/poster format during the CHOMPS workshop (collocated with [ https://2025.aaclnet.org/ | IJCNLP-AACL 2025, Mumbai, India ] ). Participants that elect to write a system description paper will be asked to review their peers’ submissions (max 2 papers per author) Key Dates: All deadlines are “anywhere on Earth” (23:59 UTC-12). * Dev set available by: 31.07.2025 * Test set available by: 05.10.2025 * Evaluation phase ends: 15.10.2025 * System description papers due: 25.10.2025 (TBC) * Notification of acceptance: 05.11.2025 (TBC) * Camera-ready due: 11.11.2025 (TBC) * Proceedings due: 01.12.2025 (TBC) * CHOMPS workshop: 23/24th December 2025 (co-located with IJCNLP-AACL 2025) Evaluation Metrics: Participants will be ranked along two criterions: 1. factuality mistakes measured via macro-F1 gold reference vs. predicted 2. fluency mistakes measured via macro-F1 gold reference vs. predicted based on our annotations. Rankings and submissions will be done separately per language: you are welcome to focus only on the languages you are interested in! How to Participate: * Register: Please register your team [ https://forms.gle/hWR9jwTBjZQmFKAE7 | https://forms.gle/hWR9jwTBjZQmFKAE7 ] and join our google group: [ https://groups.google.com/g/shroomcap | https://groups.google.com/g/shroomcap ] * Submit results: use our platform to submit your results before 15.10.2025 * Submit your system description: system description papers should be submitted by 25.10.2025 (TBC, further details will be announced at a later date). Want to be kept in the loop? Join our [ https://groups.google.com/g/shroomcap | Google group mailing list ] ! We look forward to your participation and to the exciting research that will emerge from this task. Best regards, SHROOM-CAP organizers

1 0

LREC 2026: 2nd CfP for Workshops
by info＠elda.org 30 Jul '25

30 Jul '25

[Apologies for multiple postings] SECOND CALL FOR WORKSHOPS - LREC 2026 Organized by the ELRA Language Resources Association Palma, Mallorca, Spain 11-16 May 2026 The Organisers of LREC 2026 invite proposals for workshops to be held in conjunction with the main conference at Palau de Congressos de Palma, Palma de Mallorca (Spain). We solicit proposals in all areas of language resources, language technology, and evaluation of the underlying technologies, broadly conceived to also include related disciplines such as linguistics, language documentation, natural language processing, speech and multimodal processing, computational social science, and the digital humanities. The workshops will be held at LREC 2026 in Palma de Mallorca (Spain) on 11, 12 and 16 May 2026. IMPORTANT DATES (All deadlines are 11:59 PM UTC-12:00 (“anywhere on Earth”) * 17 October 2025: Proposal submission deadline * 17 November 2025: Notification of acceptance * 11-16 May 2026: LREC2026 conference SUBMISSION INFORMATION Submissions should follow this template: https://www.overleaf.com/project/68879da091da5870fcb655de <https://www.overleaf.com/project/68879da091da5870fcb655de> Proposals should be submitted as PDF documents using the START system (https://softconf.com/lrec2026/workshops/ <https://softconf.com/lrec2026/workshops/>). Note that submissions should essentially be ready to be turned into a Call for Workshop Papers within one week of notification of acceptance (see Important dates above). The proposals should be at most two pages for the main proposal + at most two additional pages for information about organisers, program committee, and references. Thus, the whole proposal should not be more than FOUR pages long, excluding references. The two pages for the main proposal must include: * A title and a brief description of the workshop topic and content. * Workshops can be half-day (morning 9:00 to 13:00 or afternoon 14:00 to 18:00) or full-day (9:00 to 18:00) and must follow fixed hours for breaks (morning coffee break 10.30-11.00, lunch break: 13:00-14:00, afternoon coffee break: 16.00-16.30). * A list of invited speakers, if applicable, with an indication of which ones have already agreed and which are tentative, and sources of funding for the speakers, if needed. * An estimate of the number of attendees. * A description of any shared tasks associated with the workshop, and estimate of the number of participants. Note that any shared task will also need to be reviewed by the workshop committee for ethical concerns. * A description of special requirements and technical needs, where relevant. * If the workshop has been held before, a note specifying where previous iterations of the workshops were held, how many submissions the workshop received, how many papers were accepted (also specify if they were not regular papers, e.g., shared task system description papers, non-archival papers), and how many attendees the workshop attracted. The two pages for information about the workshop, the organisers and the program committee must include: * A very brief advertisement or tagline for the workshop, up to 140 characters, that highlights any key information you wish prospective attendees to know, and which would be suitable to be put onto a web-based survey (see below). * The names, affiliations, and email addresses of the organisers, with one-paragraph statements of their research interests, areas of expertise, and experience in organising workshops and related events. * A list of Program Committee members, with an indication of which members have already agreed. Organisers should do their best to estimate the number of submissions (especially for recurring workshops) in order to (a) ensure a sufficient number of reviewers so that each paper receives 3 reviews, and (b) anticipate that no one is committed to reviewing more than 3 papers. This practice is likely to ensure on-time, and more thorough and thoughtful reviews. Submission is electronic. Please submit the proposals using the START system at this URL:https://softconf.com/lrec2026/workshops/ <https://softconf.com/lrec2026/workshops/> EVALUATION CRITERIA The workshop proposals will be evaluated according to their originality and impact, the expected interest level of participants, as well as the quality of the organising team and Program Committee, and their contribution to the diversity of the conference. DIVERSITY AND INCLUSION We particularly encourage submissions of underrepresented groups in language resources and language technology, including researchers from any demographic or geographic minority, with disabilities, or others. In the evaluation of the proposal, we will take these aspects into account to create a varied and balanced set of workshops. Workshop proposals are evaluated on a range of aspects, including diversity, such as (1) how the topic of the workshop contributes to improved diversity and increased fairness in the field, (2) if the topic is particularly relevant for a specific underrepresented group of potential participants, (3), if the presenters are from an underrepresented group. WORKSHOP ORGANISER RESPONSIBILITIES At least one of the accepted organisers must attend the workshop in person. The organisers of the accepted proposals are responsible for publicizing and running the workshop, including reviewing submissions, producing the workshop program and the camera-ready workshop proceedings according to LREC requirements, organising the meeting days, and playing their part to ensure that all participants are aware of LREC’s anti-harassment policy and code of conduct (see https://lrec2026.info/lrec-2026-code-of-conduct/ <https://lrec2026.info/lrec-2026-code-of-conduct/>). It is crucial that organisers commit to all deadlines. In particular, failure to produce the camera-ready proceedings in the correct format on time will lead to the exclusion of the workshop from the unified proceedings and author indexes. Workshop organisers cannot accept submissions for publication that will be (or have been) published elsewhere, although they are free to set their own policies on simultaneous submission and review, as well as to accept additional non-archival presentations CONTACT * Workshop Chairs: lrec2026-workshop-chairs(a)googlegroups.com <mailto:lrec2026-workshop-chairs@googlegroups.com> * General contact: mailto:info@lrec2026.info <mailto:info@lrec2026.info> * More information on LREC 2026: https://lrec2026.info/ <https://lrec2026.info/>

1 0

LREC 2026 Second Call for Papers - Palma de Mallorca, May 11-16, 2026
by info＠elda.org 30 Jul '25

30 Jul '25

[Apologies for multiple postings] SECOND CALL FOR PAPERS LREC 2026 Organized by the ELRA Language Resources Association Palma, Mallorca, Spain 11-16 May 2026 The Fifteenth biennial Language Resources and Evaluation Conference (LREC) will be held at the Palau de Congressos de Palma in Palma, Mallorca, Spain, on 11-16 May 2026. LREC serves as the primary forum for presentations describing the development, dissemination, and use of language resources involving both traditional and recently developed approaches. The scientific program will include invited talks, oral presentations, and poster and demo presentations, as well as a keynote address by the winner of the Antonio Zampolli Prize. Submissions describing all aspects of language resource development and use are invited, including, but not limited to, the following: Language Resource Development * Methods and tools for mono- and multi-lingual language resource development and annotation * Knowledge discovery/representation (knowledge graphs, linked data, terminologies, lexicons, ontologies, etc.) * Resource development for less-resourced/endangered languages * Guidelines, standards, best practices, and models for interoperability Language Resource Use * Use of language resources in systems and applications for any area of language and speech processing * Use of language resources in assistive technologies, support for accessibility * Efficient/low-resource methods for language and speech processing Evaluation * Methodologies and protocols for evaluation and benchmarking of language technologies * Measures for validation of language resources and quality assurance * Usability of user interfaces and dialogue systems * Bias, safety, and user satisfaction metrics * Interpretability/explainability of language models and language and speech processing tools Language Resources and Large Language Models * Language resource development for LLMs (monolingual, multilingual, multimodal) * (Semi-)automatic generation of training data * Training, fine-tuning, adaptation, alignment, and representation learning * Guardrails, filters, and modules for generative AI models Policy and Organizational Considerations * International and national activities, projects, initiatives, and policies * Language coverage and diversity * Replicability and reproducibility * Organisational, economic, ethical, climate, and legal issues Paper Theme Tracks The above topics are organised in 27 main tracks: * T01Applications Involving LRs and Evaluation for any area/domain of language and speech processing * T02Bias, Offensive and Non-inclusive Language; Guardrails, filters * T03Corpora, Treebanks and Annotation; Tools, Systems and Platforms * T04Dialogue, Conversational Systems, Chatbots, Human-Robot Interaction * T05Digital Humanities, Cultural Heritage and Computational Social Science * T06Discourse and Pragmatics * T07Document Classification, Information Retrieval and Cross-lingual Retrieval * T08Ethics, Research Reproducibility and Replicability, and Environmental Issues * T09Evaluation, Validation, Quality Assurance and Benchmarking Methodologies * T10Inference, Reasoning, Question Answering * T11Information Extraction and Text Mining * T12Interpretability/explainability of language models and language and speech processing tools * T13Knowledge discovery/representation (knowledge graphs, linked data, terminologies, lexicons, ontologies, etc.) * T14Language Modeling and LRs (including training, fine-tuning, representation learning, and generation of synthetic data) * T15Less-Resourced/Endangered/Less-studied Languages * T16Lexicon and Semantics * T17Machine Learning Methods and Techniques for Language and Speech Processing, including efficient/low-resource methods * T18Multilinguality, Machine Translation (including Speech-to-Speech) and Translation Aids * T19Multimodality, Cross-modality (including Sign Languages, Vision and Other Modalities), Multimodal Applications, Grounded Language Acquisition * T20Natural Language Generation and Summarization * T21Simplification, Plain Language and Assistive Technologies * T22Opinion & Argument Mining, Sentiment Analysis, Emotion Recognition/Generation * T23Parsing, Tagging, Chunking, Grammar, Syntax, Morphosyntax, Morphology * T24Policy and Legal Issues (including Language Resource Infrastructures, Interoperabillity, Standards for LRs, Metadata) * T25Psycholinguistics, Cognitive Linguistics and Linguistic Theories * T26Social Media Processing * T27Speech Resources and Processing (including Phonetic Databases, Phonology, Prosody, Speech Recognition, Synthesis and Spoken Language Understanding) Separate calls have been issued for Workshops, Tutorials. We will also organise an Industry Track to report on state of the art within industry and commercial achievements, for which there will be a separate Call. Paper Submission and Templates Submission is electronic, using the Softconf START conference management system via the link: https://softconf.com/lrec2026/main/ Submissions should be 4 to 8 pages in length (excluding references and potential Ethics Statements). Submissions should follow the LREC stylesheet, available on the conference website in Authors' Kit page <https://lrec2026.info/authors-kit/> and the overleaf link is here: https://www.overleaf.com/project/6887c0280bfaab6e3e8bd0bc At the time of submission, authors are offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map <https://lremap.elra.info/>, which provides metadata for the resource. Accepted papers will appear in the conference proceedings, which include both oral and poster papers in the same format. Determination of the presentation format (oral vs. poster) is based solely on an assessment of the optimal method of communication (more or less interactive), given the paper content. Author Responsibilities Papers must be of original, previously unpublished work. Papers must be anonymized to support double-blind reviewing. Submissions thus must not include authors’ names and affiliations. The submissions should also avoid links to non-anonymized repositories; the code should be either submitted as supplementary material in the final version of the paper, or as a link to an anonymized repository (e.g., Anonymous GitHub <https://anonymous.4open.science/> or Anonym Share <https://anonymfile.com/>). Papers that do not conform to these requirements will be rejected without review. Papers that have been or will be under consideration for other venues at the same time must be declared at submission time. If a paper is accepted for publication at LREC 2026, it must be immediately withdrawn from other venues. If a paper under review at LREC 2026 is accepted elsewhere and authors intend to proceed there, the LREC 2026 Programme Committee must be notified immediately. Ethics Statement We encourage all authors submitting to LREC 2026 to include an explicit ethics statement on the broader impact of their work, or other ethical considerations after the conclusion but before the references. The ethics statement will not count toward the page limit. Presentation Requirement All papers accepted for the main conference track must be presented at the conference to appear in the proceedings, and at least one author must register for LREC 2026. Papers will be presented either orally or as posters. The specific presentation type of a paper will be decided based on its content, with no difference in quality implied. Papers that include a demonstration component will be presented as posters. The conference will be hybrid, including both on-site and virtual presentations. For hybrid purposes, all authors of papers accepted to the main conference, whether oral or poster, will be required to upload a presentation video and a set of slides, plus the poster PDF, for the authors of an accepted paper as Poster on the Conference Catalysts platform. This material will also be inserted in the LREC 2026 online proceedings. Important dates (All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”) * Oral and poster (or poster+demo) paper submission: 17 October 2025 * Notification of acceptance: 13 February 2026 * Camera Ready due: 6 March 2026 * LREC 2026 conference: 11-16 May 2026 More informationon LREC 2026: https://lrec2026.info/ <https://lrec2026.info/> Contact: lrec2026-pcs(a)googlegroups.com <mailto:lrec2026-pcs@googlegroups.com>

1 0

Postdoctoral Researcher in NLP for Inclusive Language Technologies at Universitat Pompeu Fabra (UPF), Barcelona, Spain!
by Horacio Saggion 30 Jul '25

30 Jul '25

The *TALN research group *at Universitat Pompeu Fabra's Department of Engineering is excited to announce an opening for a *Postdoctoral Researcher* in Barcelona, Spain! This is a fantastic opportunity to contribute to the *Horizon Europe project IDEAL* (Inclusive Democratic Engagement and Language Technologies in Europe), focusing on crucial areas: - Multilingual *Text Simplification*: Developing solutions for citizens with language comprehension challenges. - Multilingual & Multicultural *Machine Translation*: Leveraging state-of-the-art deep neural models, including LLMs, to enhance communication across diverse languages and cultures. You'll be a key technical lead, responsible for research and development, including: - Adapting and fine-tuning existing multilingual LLMs for specific tasks - Developing and calibrating text-to-speech systems. - Acquiring and preparing datasets and "benchmarks" for translation and simplification. - Developing and calibrating automatic evaluation systems. - Integrating developed libraries into applications. - Collaborating with an expert NLP team at UPF and project partners across Europe. - Participating in meetings, writing reports, and publishing scientific papers. Who We're Looking For: *Required Education:* PhD or equivalent in Computer Science or related field. *Experience:* At least 4 years of research experience (Recognised Researcher R2). *Key Skills:* Proven experience with Large Language Models (LLMs) for text classification, generation, translation, and simplification. *Languages: *Excellent English is a must. Desirable knowledge of French, German, Greek, Arabic, or Persian. *Additional Assets:* Experience in research projects, scientific writing/publishing, and software development. *Benefits:* *Gross Salary:* Up to €41,370.76 annually. *Comprehensive Benefits:* Social security, employment benefits, paternity/maternity leave, and access to the public health system. *Leave: *4 weeks of annual leave. *Eligibility:* Must have the right to work in the EU. Apply Now! https://lnkd.in/dbGexJz7 Open Date: July 29, 2025 Application Deadline: August 31, 2025, at 11:59 PM Eastern Time Questions? Send a message to: horacio.saggion(a)upf.edu If you're an experienced researcher passionate about inclusive language technologies and LLMs, we encourage you to apply and make a real impact! -- Horacio Saggion Full Professor / Chair in Computer Science and Artificial Intelligence Head of the Natural Language Processing Group - TALN Project Coordinator iDEM Project (HE) Co-PI of the AI-BOOST project (HE) Co-PI of the IDEAL project (HE) Universitat Pompeu Fabra https://twitter.com/h_saggion https://www.linkedin.com/in/horacio-saggion-1749b916 -- Horacio Saggion Full Professor / Chair in Computer Science and Artificial Intelligence Head of the Natural Language Processing Group - TALN Project Coordinator iDEM Project (HE) Co-PI of the AI-BOOST project (HE) Co-PI of the IDEAL project (HE) Universitat Pompeu Fabra https://twitter.com/h_saggion https://www.linkedin.com/in/horacio-saggion-1749b916

1 0

15 September: Corpus Linguistics MOOC Returns – New Platform, Fresh Content,
by Brezina, Vaclav 30 Jul '25

30 Jul '25

Dear all, The ESRC Centre for Corpus Approaches to Social Science at Lancaster University (CASS) is delighted to announce that our popular Corpus Linguistics MOOC is returning this September! Whether you're new to corpus linguistics or have joined us in the past, there's something for everyone. This year’s course features new materials reflecting the latest developments in the field – including the impact of AI and new technologies. 📅 Course starts: 15 September 🎓 Register FREE via edX: edx.org – Corpus Linguistics and New Technologies<https://edx.org/learn/social-sciences/lancaster-university-corpus-linguisti…> You’re welcome to join us yourself or recommend it to your students! What’s new? * 📍 Now hosted on a new platform with updated, engaging content * 🤖 Focus on technological advances in corpus linguistics * 💡 Available as a free course, or as a microcredential (USD 600) with certification 🎁 Bonus: All MOOC participants will receive a £500 discount on our MA/PGCert in Corpus Linguistics (Distance) at Lancaster University. Learn more: Lancaster MA in Corpus Linguistics<https://www.lancaster.ac.uk/study/postgraduate/postgraduate-courses/corpus-…> We hope to see you (and your students) in the course! Best, Vaclav Professor Vaclav Brezina Professor in Corpus Linguistics Department of Linguistics and English Language ESRC Centre for Corpus Approaches to Social Science Faculty of Arts and Social Sciences, Lancaster University Lancaster, LA1 4YD Office: County South, room C05 T: +44 (0)1524 510828 [cid:ae9f4bc7-9432-4d40-8187-40cbdb08f37b]@vaclavbrezina [cid:4f14ad86-46b2-4996-aad1-966b4ea4c6c9]<http://www.lancaster.ac.uk/arts-and-social-sciences/about-us/people/vaclav-…>

1 0

CODI-CRAC 2025: Deadline Extension
by Amir.Zeldes＠georgetown.edu 29 Jul '25

29 Jul '25

*apologies for cross-postings* � CODI CRAC – deadline extended to Aug. 6!!! � CODI CRAC 2025 Workshop: joint call for papers 6th Workshop on Computational Approaches to Discourse and 8th Workshop on Computational Models of Reference, Anaphora and Coreference � November 5-9 2025 - EMNLP 25 - Suzhou, China � We are pleased to announce that we are organizing in 2025 the first joint CODI-CRAC workshop that will be held during EMNLP! More information on: <https://sites.google.com/view/codi-crac2025/> https://sites.google.com/view/codi-crac2025/ � � Deadline for CODI CRAC papers: Aug. 6 2025 � We will host 2 shared tasks, the CRAC and the DISRPT shared tasks. More information on: - CRAC shared task: <https://ufal.mff.cuni.cz/corefud/crac25> https://ufal.mff.cuni.cz/corefud/crac25 � - DISRPT shared task: � <https://sites.google.com/view/disrpt2025/> https://sites.google.com/view/disrpt2025/ � � Aims and scope � The last few years have seen a dramatic improvement in the ability of NLP systems and Large Language Models to understand and produce words, sentences and in some cases longer texts. This development has created a renewed interest in discourse problems as researchers move towards the processing of long-form documents and conversations. There is a surge of activity in discourse pretraining tasks, coherence models, summarization for long texts and conversations, corpora for discourse level reading comprehension and formal parsing, as well as discourse related/aided representation learning, to name a few. � Discourse, roughly the interactions of context, form and meaning above the sentence level, is at the intersection of many areas in Computational Linguistics and NLP, since it is concerned with all levels of linguistic representation, allowing the modeling of textual coherence and inference leveraging long-distance links within documents.It thus brings together researchers working on different areas but facing similar issues with coherence and cohesion, document-level structure, long text and long context. � In 2025, we organize the first joint CODI-CRAC workshop. The CODI workshop has been a forum for a broad range of work at the discourse level. The CRAC workshop has been a primary venue for researchers interested in the computational modeling of reference, anaphora, and coreference. Together, these workshops have catalyzed work to advance research on discourse level problems and have served as a forum for the discussion of suitable datasets and reliable evaluation methods. � This joint edition corresponds to the 6th CODI workshop and the 8th CRAC workshop. It will welcome contributions from all the areas below, including state of the art textual NLU and NLG work using LLMs, as well as classic structured work on automatic discourse analysis -- corresponding to challenging tasks such as coreference resolution or discourse parsing -- to encourage interaction between communities. The workshop is set to host the fourth edition of the DISRPT shared task on Discourse Relation Parsing and Treebanking and the fourth edition of the CRAC shared task on Multilingual Coreference Resolution. � The workshop is planned as a 1 day event which brings together different subcommunities. It will feature invited talks and regular papers. We also accept papers accepted at other major conferences for non-archival presentation, including Findings papers. � Topics of interest � We welcome papers on symbolic and probabilistic approaches, corpus development and analysis, as well as machine and deep learning approaches to discourse. We appreciate theoretical contributions as well as practical applications, including demos of systems and tools. The goal of the workshop is to provide a forum for the community of NLP researchers working on all aspects of discourse. � Topics of interest include, but are not limited to: - discourse structure - discourse connectives - discourse relations - annotation tools and schemes for discourse phenomena - corpora annotated with discourse phenomena - discourse parsing - cross-lingual discourse processing - cross-domain discourse processing - anaphora and coreference resolution - event coreference - argument mining - coherence modeling - discourse and semantics - discourse in applications such as machine translation, summarization, etc. - evaluation methodology for discourse processing - discourse pretraining tasks - long-text modeling and generation � Submissions � We solicit three categories of papers: regular (long and short) workshop papers, demos and extended abstracts. Only regular workshop papers and demos will be included in the proceedings as archival publications. Double submission of papers is allowed, but this information will need to be disclosed at submission time. � Regular papers must describe original unpublished research. Long papers may consist of up to 8 pages of content, plus unlimited pages for references. Short papers can be up to 4 pages, plus unlimited pages for references. Demo submissions may describe systems, tools, visualizations, etc., and may consist of up to 4 pages, plus unlimited pages for references. � Each submission can contain unlimited pages for Appendices but the paper submissions need to remain fully self-contained, as these supplementary materials are completely optional, and reviewers are not even asked to review them. � Extended abstracts can describe work in progress. These may be two pages long (without references). Extended abstracts are non-archival. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings. Paper accepted or rejected at one of the main conferences � We also invite presentations of paper accepted at another main conference, a specific deadline and submission process will be communicated later on. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings. We also fast-track ARR papers with reviews, with timeline TBA. � Submission website � All submissions must be anonymous and follow the EMNLP 2025 formatting instructions described here: https://aclrollingreview.org/cfp � Submission websites: * CODI: <https://softconf.com/emnlp2025/codi2025/> https://softconf.com/emnlp2025/codi2025/ * DISRPT: <https://softconf.com/emnlp2025/disrpt2025/> https://softconf.com/emnlp2025/disrpt2025/ * CRAC: � <https://softconf.com/emnlp2025/crac2025/> https://softconf.com/emnlp2025/crac2025/ � Schedule � - July 30 2025 --> August 6 2025:CODI CRAC papers due - September 5 2025:Notification of acceptance - September 19 2025:Camera ready deadline - November 8-9 2025-:CODI-CRAC workshop � All deadlines are 11.59 pm UTC -12h ("anywhere on Earth"). � Invited Speakers � - Tanya Goyal, Cornell University. - Nancy F. Chen, Institute of Infocomm Research (I2R), A-STAR, Singapore � Organizers � - Chloé Braud, CNRS-IRIT - Christian Hardmeier, IT University of Copenhagen - Chuyuan (Lisa) Li, � University of British Columbia - Jessy Li, University of Texas, Austin - Sharid Loáiciga, University of Gothenburg - Vincent Ng, University of Texas at Dallas - Michal Novák, Charles University, Prague - Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences - Massimo Poesio, Queen Mary University of London and University of Utrecht - Sameer Pradhan, University of Pennsylvania and cemantix - Michael Strube, Heidelberg Institute for Theoretical Studies - Amir Zeldes, Georgetown University, Washington DC To contact the organizers, please send an email to: <mailto:codi-crac-workshop@googlegroups.com> codi-crac-workshop(a)googlegroups.com

1 0

2025

2024

2023

2022

Corpora July 2025