July 2025 - Corpora - ELRA lists

CfP DHOW: Workshop Diffusion of Harmful Content on Online Web Workshop - @ ACMMM 2025
by Thomas Mandl 15 Mar '26

15 Mar '26

The 2st Workshop on DHOW: Diffusion of Harmful Content on Online Web Workshop The workshop will be conducted in a *hybrid* format to ensure maximum participation, accommodating attendees both *online* and in person. Submission deadline: *July 11 2025 AOE* *Workshop site*: https://dhow-workshop.github.io/2025/ *Co-located with ACMMM 2025* https://acmmm2025.org/ <https://lrec-coling-2024.org/> Dublin, Ireland, 27-31 October 2024 *Important Dates* Submission deadline: extended to *July 11, 2025* Notification of acceptance: August 01, 2025 Camera-ready papers due: August 11, 2025 Workshop date: October 27/28, 2025 *Workshop Description* With the advancement of digital technologies and gadgets, online content is easily accessible. At the same time, harmful content also gets spread. There are different harmful content available on different platforms in multiple languages. The topic of harmful content is broad and covers multiple research directions. But from the user’s aspect, they are affected by them all. Often, it is studied individually, like misinformation and hate speech. Research has been done on one platform, monolingual, on a particular issue. It leads to harmful content spreaders switching platforms and languages to reach the user base. Harmful is not limited to social media but also news media. Spreader shares harmful content in posts, news articles, comments, and hyperlinks. So, there is a need to study the harmful content by combining cross-platform, language, multimodal data and topics. We will bring the research on harmful content under one umbrella so that research on different topics (hate speech, misinformation, disinformation, self-harm, offensive content, etc.) can bring some novel methods and recommendations for users, leveraging text analysis with image, audio, and video recognition to detect harmful content in diverse formats. The workshop will cover the ongoing issue of war or elections in 2025. We believe this workshop will provide a unique opportunity for researchers and practitioners to exchange ideas, share latest developments, and collaborate on addressing the challenges associated with harmful contents spread across the Web. We expect that the workshop will generate insights and discussions that will help advance the field of societal artificial intelligence (AI) for the development of safer internet. In addition to attracting high quality research contributions to the workshop, one of the aims of the workshop is to mobilise the researchers working on the related areas to form a community. *Submissions Topics* •Studying different types of harmful content •Computational fact-checking & Misinformation Detection •Role of Generative AI in Mitigating Harmful Content •Harassment, Bullying, and Hate Speech Detection •Explainable AI for Harmful Content Analysis •Multimodal and Multilingual Harmful Content Detection such as fake news, spam, and troll detection. •Deepfake and Synthetic Media •Ethical & Societal Implications of AI in Content Moderation •Both Qualitative and Quantitative study on harmful content •Psychological effects of harmful content like mental health •Approaches for data collection or data annotation using multimodal large models on harmful content •User study on the effects of harmful content on human beings *Submissions* - Submission Instructions: https://dhow-workshop.github.io/2025/#call <https://dhow-workshop.github.io/2025/#call> - Submission Link: https://openreview.net/group?id=acmmm.org/ACMMM/2025/Workshop/DHOW <https://openreview.net/group?id=acmmm.org/ACMMM/2025/Workshop/DHOW> ***Workshop organizers* •Thomas Mandl (University of Hildesheim, Germany) •Haiming Liu (University of Southampton, United Kingdom) •Gautam Kishore Shahi(University of Duisburg-Essen, Germany) •Amit Kumar Jaiswal (University of Surrey, United Kingdom ) •Durgesh Nandini (University of Bayreuth, Germany) DHOW 2025

1 3

Digital lexicography and lexical computing workshop, Bari, Italy
by Ondřej Matuška 19 Jan '26

19 Jan '26

*<Lexicom/>* a workshop in digital lexicography and lexical computing *Registration open* *Bari, Italy*15 – 19 September 2025 Your 5 days to get up-to-date with the latest developments in *corpus-driven lexicography* and to practice your *corpus building and corpus query skills* with some of the top experts in the field. For the programme, lecturers, invited speakers, fees and registration, visit this website *lexicom.courses <https://lexicom.courses/upcoming-lexicom/>* I hope to meet you in Bari in September! Ondřej *Ondřej Matuška* sketchengine.eu <http://www.sketchengine.eu/> | Facebook <https://www.facebook.com/SketchEngine/> | LinkedIn <https://www.linkedin.com/in/ondrejmatuska> | Twitter <https://twitter.com/SketchEngine>

1 1

Third Call for Papers, Participation and Demo - IWSLT 2025
by Atul K. Ojha 12 Dec '25

12 Dec '25

Apologies for cross-posting. ---------------------------------------- *The International Conference on Spoken Language Translation* *ACL – 22nd IWSLT 2025 – **Third** Call for Participation* *31 July-1 August 2025 - Vienna, Austria* http://iwslt.org The International Conference on Spoken Language Translation (IWSLT) <https://iwslt.org/> is the premier annual conference for all aspects of Spoken Language Translation. Every year, the conference organises and sponsors open evaluation campaigns around key challenges in simultaneous and consecutive translation, under real-time/low latency or offline conditions and under low-resource or multilingual constraints. System descriptions and results from participants’ systems and scientific papers related to key algorithmic advances and best practices are presented. IWSLT is the venue of the SIGSLTs <https://iwslt.org/sigslt/>, the Special Interest Group on Spoken Language Translation <https://iwslt.org/sigslt/> of ACL <https://www.aclweb.org/portal/>, ISCA <https://www.isca-speech.org/> and ELRA <https://www.elra.info/>. With a track record of 21 years, IWSLT benchmarks and proceedings serve as reference for all researchers and practitioners working on speech translation and related fields. The 22nd edition of IWSLT will be run as a hybrid ELRA <https://www.elra.info/>/ACL <https://www.aclweb.org/portal/> event, co-located with ACL 2025 <https://2025.aclweb.org/> from 31 July to 1 August 2025. *Important Dates* *January 1, 2025*: Release of shared task training and dev data *March 15, 2025*: Scientific paper submission deadline *Apr 1-15, 2025*: Evaluation period *April 21, 2025*: System description paper and demo submission deadline *May 15, 2025*: Notification of acceptance *June 1, 2025*: Camera-ready deadline (all paper) *July 31-Aug 1*, *2025*: IWSLT conference *Evaluation* The IWSLT 2025 features shared tasks <https://iwslt.org/2025/#shared-tasks> that address the following focus areas: - High-resource ST: Offline track, Simultaneous track, Subtitling track, Model compression track - Low-resource ST: Low-resource and Indic (multilingual) tracks - Instruction-following Speech Processing track: Technical domain ST, ASR, Summarization, and QA Training and development data for each shared task will be prepared and released by the respective organisers (for further information on this initiative, please refer to the IWSLT website <https://iwslt.org/2025/>). Participants will receive instructions about how to submit their runs. In addition, participants have the opportunity to present their work through a system paper that will be published in the ACL Proceedings. *Conference* IWSLT also invites submissions of scientific papers to be published in the ACL Proceedings and presented either in oral or poster format. The conference selects high-quality, original contributions on theoretical and practical issues of spoken language translation research, technologies and applications. Submissions will be accepted directly through the IWSLT submission site (to be announced on the website <https://iwslt.org/2025/>). We will also accept commitments of submissions with reviews from the ACL Rolling Review. Additionally, to foster cross-pollination of ideas, the conference also invites the presentation of papers on speech translation recently published elsewhere. Please note that this is for non-archival presentation of papers relevant to speech translation already published in other venues (e.g., Findings for the *ACL, speech, NLP or MT conferences). Submissions for this category will be accepted through a dedicated form (to be announced on the website <https://iwslt.org/2025/>). Papers will be checked for relevance to IWSLT, and assigned either oral or poster presentation slots if selected. *Demo Session* We invite researchers, practitioners, and industry professionals to participate in an engaging demo session highlighting innovative systems, tools, and component technologies that advance the field of speech translation. The session will include live and interactive system demonstrations to foster discussion and knowledge exchange among participants across the field. For more information, please see our Call for Demos <https://iwslt.org/2025/call-for-demos>. *Contact* Please email iwslt-evaluation-campaign(a)googlegroups.com if you have any questions related to the shared tasks. Thanks, Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul (IWSLT organisers)

1 1

Advance notice: ‘Statistics for linguistics with R’ bootcamp (08 – 12/07/2024)
by Magali Paquot 01 Dec '25

01 Dec '25

The Linguistics Research Unit of the Institute of Language and Communication (Université catholique de Louvain, Belgium) will be hosting Stefan Gries’s next bootcamp on statistics for linguistics with R from 08 to 12 July 2024. The ‘Statistics for linguistics with R’ bootcamp is a hands-on introduction to statistical methods for both graduate students and seasoned researchers and is loosely based on the third edition (2021) of Gries’s textbook Statistics for linguistics with R. The course is intended for linguists who already have a basic knowledge in statistics and some experience using R and who wish to improve their proficiency in statistical modeling of linguistic data. Using the open source software and programming language R, we will deal with: • fundamental aspects of fixed effects regression modeling for both numeric and binary response variables; these include exploration of data and their preparation for modeling, model formulation and selection; numerical and visual interpretation and evaluation of models; • more advanced aspects of fixed-effects regression modeling such as contrasts for ordinal predictors, orthogonal contrasts, curvature of numeric predictors, and maybe general linear hypothesis tests; • the theoretical foundations of mixed-effects regression modeling; • applications of mixed-effects modeling for both numeric and binary response variables; • tree-based methods and random forests: 'fitting' and interpreting them with importance scores, partial dependence scores, and detecting (not just capturing) interactions. The website of the bootcamp will be online in early 2024 and online registration will start on 1 March 2024, 11 am CEST. The number of participants is limited. If you would like to participate, mark the date in your diary! Contact email: magali.paquot(a)uclouvain.be<mailto:magali.paquot@uclouvain.be> Magali Paquot Convenor

1 2

CfP: 1st Workshop on Human-LLM Collaboration for Ethical and Responsible Science Production (SciProdLLM 2025)
by Tristan Miller 31 Oct '25

31 Oct '25

SciProdLLM 2025: 1st Workshop on Human-LLM Collaboration for Ethical and Responsible Science Production December 23/24, 2025, Mumbai, India Co-located with IJCNLP-AACL 2025 https://sciprodllm.github.io/ CALL FOR PAPERS SciProdLLM 2025 is a forum for presenting and discussing research on integrating large language models (LLMs) into the typical research workflow: from ideation to experimentation to scientific writing, with a particular focus on human-centered approaches that ensure ethical and responsible use of LLMs. We also invite work that evaluates the quality of LLM-assisted research workflows and the resulting outputs. We welcome submissions on any aspect of human-LLM collaboration for science production, evaluation of LLMs for science production, and/or evaluation of LLM-assisted scientific papers. Relevant topics include, but are not limited to, the following: - Guiding idea generation through user feedback - Automated experimentation following the experimental workflow used by human scientists (e.g., the workflow from data preprocessing to comparison to baselines) - Human-curated datasets of scientific papers for fine-tuning LLMs for generating ideas and paper content (text, figures, tables, etc.) - Human-LLM co-authored peer reviews (e.g., LLM-assisted peer review platforms) - Benchmark datasets for evaluating LLMs on idea generation, experimentation, multimodal content generation, or scientific writing - Evaluation metrics for detecting problematic papers (e.g., those containing suspicious citations or tortured phrases) - Statistical analyses of collections of LLM-assisted papers (e.g., on topics, citations, or retractions) SUBMISSION INSTRUCTIONS SciProdLLM 2025 welcomes long and short papers. Long papers may consist of up to 8 pages of content, plus unlimited pages of references. Short papers may consist of up to 4 pages of content, plus unlimited pages of references. Both types of submissions must follow the same requirements and procedures as for IJCNLP-AACL 2025 main conference papers: <https://2025.aaclnet.org/calls/main_conference_papers> Note that papers submitted as non-archival will be allocated presentation time at the workshop but will not be included in the proceedings. There are three supported submission modes: - Direct submissions: Direct submissions will receive up to three double-blind reviews, and a final decision on acceptance from the workshop organizers. Direct submissions should be made through the SciProdLLM page on OpenReview: <https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/SciPro…> - ARR submissions: Unpublished papers that have already been reviewed and meta-reviewed through ACL Rolling Review may be committed to SciProdLLM. These papers will not receive new reviews but may be meta-reviewed by the workshop organizers, who will make a final decision on acceptance. A commitment link will published in a future version of this call. - Previously published papers: We invite non-archival submissions of papers that have already been recently published elsewhere. This allows such papers to gain more visibility from the workshop audience. To submit a previously published paper for presentation, please email SciProdLLM(a)groups.io with the details of your paper (title, authors, abstract, publication venue) and attaching a PDF copy of the paper. (Submissions of previously published papers need not adhere to the IJCNLP-AACL 2025 main conference paper policies on anonymity.) IMPORTANT DATES All deadlines are 23:59 UTC−12 ("Anywhere on Earth"). - September 29, 2025: Submission deadline for direct submissions - October 27, 2025: ARR commitment deadline - November 3, 2025: Notification of acceptance - November 11, 2025: Camera-ready papers due - December 23 or 24, 2025: Workshop presentations (exact date TBA) ORGANIZING COMMITTEE - Wei Zhao, University of Aberdeen, UK - Jennifer D’Souza, TIB Leibniz Information Center for Science and Technology, Germany - Steffen Eger, University of Technology Nuremberg, Germany - Anne Lauscher, University of Hamburg, Germany - Yufang Hou, IT:U Interdisciplinary Transformation University, Austria - Nafise Sadat Moosavi, University of Sheffield, UK - Tristan Miller, University of Manitoba, Canada - Chenghua Lin, University of Manchester, UK CONTACT INFORMATION Email: SciProdLLM(a)groups.io WWW: https://sciprodllm.github.io/ -- Dr. Tristan Miller, Assistant Professor Department of Computer Science, University of Manitoba https://clam.cs.umanitoba.ca/ | Tel. +1 204 474 6792

1 3

[CfP] CHOMPS Workshop colocated with IJCNLP-AACL 2025
by Aman Sinha 24 Aug '25

24 Aug '25

First CFP: CHOMPS – Confabulation, Hallucinations, & Overgeneration in Multilingual & Precision-critical Settings (with our apologies for cross-posting) Venue: IJCNLP-AACL 2025 (https://2025.aaclnet.org/), Mumbai, India Date: 23/24th December 2025 (TBC) Workshop website: https://chomps2025.github.io/ * Description * Despite rapid advances, LLMs continue to "make things up": a phenomenon that manifests as hallucination, confabulation, and overgeneration. That is, produce unsupported and unverifiable text that sounds deceptively plausible. These outputs pose real risks in settings where accuracy and accountability are non-negotiable, including healthcare, legal systems, and education. The aim of the CHOMPS workshop is to find ways to mitigate one of major the hurdles that currently prevent the adoption of Large Language Models in real-world scenarios: namely, their tendency to hallucinate, i.e., produce unsupported and unverifiable text that sounds deceptively plausible. The workshop will explore hallucination mitigation in practical situations, where this mitigation is crucial: in particular, precision-critical applications (such as those in the medical, legal and biotech domains), as well as multilingual settings (given the lack of resources available to reproduce what can be done for English in other linguistic contexts). In practice, we intend to invite works of the following (not exclusive) list of topics: * Workshop topics * - Metrics, benchmarks and tools for hallucination detection - Factuality challenges in mission critical & domain-specific (e.g., medical, legal, biotech) and their consequences - Mitigation strategies during inference or model training - Studies of hallucinatory and confabulatory behaviors of LLMS in cross-lingual and multilingual scenarios - Confabulations in language & multimodal (vision, text, speech) models - Perspectives and case studies from other disciplines - … * Invited speakers * - Anna ROGERS, IT University of Copenhagen - Danish PRUTHI, IISc Bangalore - Abhilasha RAVICHANDER, University of Washington * Submission details * The workshop is designed with a widely inclusive submission policy so as to foster as vibrant a discussion as possible. Archival or non-archival submissions may consist of up to 8 pages (long) or 4 pages (short) of content. Dissemination submissions may consist of up to 1 pages of content. On acceptance, authors may add one additional page to accommodate changes suggested by the reviewers. Please use the ACL style templates available here: https://github.com/acl-org/acl-style-files The submissions need to be done in PDF format via (a) via Direct submission (https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/CHOMPS) (b) via ARR commitment (https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/CHOMPS…) * Important dates * Paper submission deadline: September 29, 2025 Direct ARR commitment: October 27, 2025 Author notification: November 3, 2025 Camera-Ready due: November 11, 2025 Workshop date: December 23-24, 2025 (TBC) * Contact * For questions, please send an email to chomps-aacl2025(a)googlegroups.com or contact one of the workshop chairs: - Aman Sinha, Université de Lorraine, aman.sinha(a)univ-lorraine.fr - Raúl Vázquez, University of Helsinki, raul.vazquez(a)helsinki.fi - Timothee Mickus, University of Helsinki, timothee.mickus(a)helsinki.fi

1 1

LREC 2026 2nd Call for Tutorial Proposals
by info＠elda.org 31 Jul '25

31 Jul '25

[Apologies for multiple postings] The 15th edition of the Language Resources and Evaluation Conference (LREC 2026) invites proposals for tutorials to be held in conjunction with the conference. We seek proposals in all areas of natural language processing and computation, language resources (LRs) and evaluation, including spoken language, sign language, and multimodal interaction. The tutorials will be held at LREC 2026 in Palma de Mallorca (Spain), on 11, 12, or 16 May 2026. *IMPORTANT DATES* 17 October 2025: Proposal submission due 17 November 2025: Notification of acceptance 11-16 May 2026: LREC 2026 conference *Cutting-edge:* tutorials that cover advances in newly emerging areas. The tutorials are expected to give a brief introduction to the topic, but participants are assumed to have some prior knowledge of the topic. The focus of the class will be on discussing the most recent developments in the field, and it will spend a considerable amount of time pointing out open research questions and important novel research directions. *Introductory to computational linguistics (CL)/ natural language processing (NLP) topics*: tutorials that provide introductions to topics that are established in the LREC communities. The lecturers provide an overview of the development of the field from the beginning until now. Attendees are not expected to come with prior knowledge. They acquire sufficient understanding of the topic to understand the most recent research in the field. *Introductory to adjacent areas:* tutorials that provide introductions to topics that are established or emerging in areas adjacent to CL/NLP. The lecturers provide an overview of the development of the field from the beginning until now. Attendees are not expected to come with prior knowledge. They acquire a sufficient understanding of the topic to understand the most recent research in the field and its relevance for the CL/NLP domains. In all cases, the aim of a tutorial is primarily to help understand a scientific problem, its tractability, and its theoretical and practical implications. Presentations of particular technological solutions or systems are welcome, provided that they serve as illustrations of broader scientific considerations. None of the tutorial types are expected to be “self-invited” long talks – the content should be a good balance between research from multiple groups and perspectives, not only of the teachers of the tutorial. Proposals should be prepared according to the style files available at https://www.overleaf.com/read/mgtcgxgmrhvz#1b1392, available also from the LREC website (https://lrec2026.info/). Proposals should not exceed 4 pages of content (plus unlimited pages for references), and they should be submitted as PDF documents. Tutorial proposals do not have to be anonymized. They should contain: * A title that helps potential attendees to understand what the tutorial will be about. * An abstract that summarizes the topics, goals, target audience, and type (see above) of the tutorial (this abstract will also be on the LREC website). * A section called “Introduction” that explains the topic and summarizes the starting point and relevance for our community, and in general. * A section called “Target Audience” that explains for whom the tutorial will be developed and what the expected prior knowledge is. Clearly specify what attendees should know and be able to practically do to get the most out of your tutorial. Examples of what to specify include prior mathematical knowledge, knowledge of specific modeling approaches and methods, programming skills, or adjacent areas like computer vision. Also specify the number of expected participants. * A section called “Outline” in which the various topics are explained. This can be a list of bullet points or a set of paragraphs explaining the content. Explain what you intend and how long the tutorial will be. * A section called “Diversity Considerations”, discussing each of the three aspects of diversity mentioned above or others. * A section called “Reading List”: What are introductory papers or books that potential attendees can read to get a first impression of the tutorial content? What do you expect them to have read before attending? What does provide further information beyond the content of the tutorial? * A section called “Presenters” in which each tutorial presenter is briefly introduced in one paragraph, including their research interests, their areas of expertise for the tutorial topic, and their experience in teaching a diverse and international audience. * A section called “Other Information” which should include information on how many people are expected to participate and how you came to this estimate. You can also explain any other aspects that you find important, including special equipment that you would need. * A section called “Ethics Statement” which discusses ethical considerations related to the topics of the tutorial. Tutorials can be half-day (morning 9:00 to 13:00 or afternoon 14:00 to 18:00) or full-day (9:00 to 18:00) and must follow fixed hours for breaks (morning coffee break 10.30-11.00, lunch break: 13:00-14:00, afternoon coffee break: 16.00-16.30). Submission is electronic. Please submit the proposals using the START system at this URL: https://softconf.com/lrec2026/tutorials *EVALUATION CRITERIA* The tutorial proposals will be evaluated according to their originality and impact, the expected interest level of participants, as well as the quality of the organizing team and Program Committee and their contribution to the diversity of the conference. *DIVERSITY AND INCLUSION* We particularly encourage submissions from underrepresented groups in computational linguistics, researchers from any demographic or geographic minority, with disabilities, or others. In the evaluation of the proposal, we will take these aspects into account to create a varied and balanced set of tutorials. This includes several aspects of diversity, namely (1) how the topic of the tutorial contributes to improved diversity and increased fairness in the field, (2) if the topic is particularly relevant for a specific underrepresented group of potential participants, and (3) if the presenters are from an underrepresented group. *INSTRUCTOR RESPONSIBILITIES* Accepted tutorial presenters will be notified by the date mentioned above. They must then provide abstracts of their tutorials for inclusion in the conference registration material by the specific deadlines. The abstract needs to be provided in ASCII format. The summary will be submitted in PDF format and can be updated from the version submitted for review. The instructors will make their material available in an appropriate way, for instance, by setting up a website. They will be invited to submit their slides to the ACL Anthology. Finally, at least one tutorial presenter must attend the event in person to organise the tutorial. *CONTACT* Tutorial Chairs: lrec2026-tutorial-chairs(a)googlegroups.com General contact: mailto:info@lrec2026.info <mailto:info@lrec2026.info> More information on LREC 2026: https://lrec2026.info/ <https://lrec2026.info/ target=> ------ *LREC 2026 Second Calls for Papers & Proposals are available. * *Deadline: October 17, 2025 * * Main conference 2nd CfP: https://lrec2026.info/calls/se... <https://click.mailerlite.com/link/c/> * Workshops 2nd CfP: https://lrec2026.info/second-c... <https://click.mailerlite.com/link/c/> * Tutorials 2nd CfP: https://lrec2026.info/second-c... <https://click.mailerlite.com/link/c/> Authors' Kit: https://lrec2026.info/authors-kit/ General contact:mailto:info@lrec2026.info <mailto:info@lrec2026.info>

1 0

[CfP] CHOMPS Shared Task: SHROOM-CAP , the Shared-task on Hallucinations and Related Observable Overgeneration Mistakes in Crosslingual Analyses of Publications
by Aman Sinha 31 Jul '25

31 Jul '25

TL;DR [ https://helsinki-nlp.github.io/shroom/2025a | SHROOM-CAP ] is a Indic-centric shared task colocated with [ https://chomps2025.github.io/ | CHOMPS-2025 ] to advance the SOTA in hallucination detection for scientific content generated with LLMs. We’ve annotated hallucinated content in 4 different high resource languages and surprisal 3* low resource indic languages from top -tier LLMs. Participate in as many languages as you’d like by accurately detecting presence of hallucinated content. Stay informed by joining our [ https://groups.google.com/g/shroomcap | Google group ] ! Full Invitation We are excited to announce the SHROOM-CAP shared task on cross-lingual hallucination detection for scientific publication (link to [ https://helsinki-nlp.github.io/shroom/2025a | website ] ). We invite participants to detect whether or not there is hallucination in the outputs of instruction-tuned LLMs in a cross-lingual scientific context. About This shared task builds upon our previous iteration, [ https://helsinki-nlp.github.io/shroom/2024 | SHROOM ] , with three key highlights: LLM-centered, cross-lingual annotations & hallucination and fluency prediction. LLMs frequently produce "hallucinations," where models generate plausible but incorrect outputs, while the existing metrics prioritize fluency over correctness. This results in an issue of growing concern as these models are increasingly adopted by the public. With SHROOM-CAP , we want to advance the state-of-the-art in detecting hallucinated scientific content. This new iteration of the shared task is held in a cross-lingual and multimodel context: we provide data produced by a variety of open-weights LLMs in 4+3* different high and low resource languages (English, French, Spanish, Hindi, and to-be-later-revealed indic languages). Participants are invited to participate in any of the languages available and are expected to develop systems that can accurately identify hallucinations in generated scientific content. Additionally, participants will also be invited to submit system description papers, with the option to present them in oral/poster format during the CHOMPS workshop (collocated with [ https://2025.aaclnet.org/ | IJCNLP-AACL 2025, Mumbai, India ] ). Participants that elect to write a system description paper will be asked to review their peers’ submissions (max 2 papers per author) Key Dates: All deadlines are “anywhere on Earth” (23:59 UTC-12). * Dev set available by: 31.07.2025 * Test set available by: 05.10.2025 * Evaluation phase ends: 15.10.2025 * System description papers due: 25.10.2025 (TBC) * Notification of acceptance: 05.11.2025 (TBC) * Camera-ready due: 11.11.2025 (TBC) * Proceedings due: 01.12.2025 (TBC) * CHOMPS workshop: 23/24th December 2025 (co-located with IJCNLP-AACL 2025) Evaluation Metrics: Participants will be ranked along two criterions: 1. factuality mistakes measured via macro-F1 gold reference vs. predicted 2. fluency mistakes measured via macro-F1 gold reference vs. predicted based on our annotations. Rankings and submissions will be done separately per language: you are welcome to focus only on the languages you are interested in! How to Participate: * Register: Please register your team [ https://forms.gle/hWR9jwTBjZQmFKAE7 | https://forms.gle/hWR9jwTBjZQmFKAE7 ] and join our google group: [ https://groups.google.com/g/shroomcap | https://groups.google.com/g/shroomcap ] * Submit results: use our platform to submit your results before 15.10.2025 * Submit your system description: system description papers should be submitted by 25.10.2025 (TBC, further details will be announced at a later date). Want to be kept in the loop? Join our [ https://groups.google.com/g/shroomcap | Google group mailing list ] ! We look forward to your participation and to the exciting research that will emerge from this task. Best regards, SHROOM-CAP organizers

1 0

LREC 2026: 2nd CfP for Workshops
by info＠elda.org 30 Jul '25

30 Jul '25

[Apologies for multiple postings] SECOND CALL FOR WORKSHOPS - LREC 2026 Organized by the ELRA Language Resources Association Palma, Mallorca, Spain 11-16 May 2026 The Organisers of LREC 2026 invite proposals for workshops to be held in conjunction with the main conference at Palau de Congressos de Palma, Palma de Mallorca (Spain). We solicit proposals in all areas of language resources, language technology, and evaluation of the underlying technologies, broadly conceived to also include related disciplines such as linguistics, language documentation, natural language processing, speech and multimodal processing, computational social science, and the digital humanities. The workshops will be held at LREC 2026 in Palma de Mallorca (Spain) on 11, 12 and 16 May 2026. IMPORTANT DATES (All deadlines are 11:59 PM UTC-12:00 (“anywhere on Earth”) * 17 October 2025: Proposal submission deadline * 17 November 2025: Notification of acceptance * 11-16 May 2026: LREC2026 conference SUBMISSION INFORMATION Submissions should follow this template: https://www.overleaf.com/project/68879da091da5870fcb655de <https://www.overleaf.com/project/68879da091da5870fcb655de> Proposals should be submitted as PDF documents using the START system (https://softconf.com/lrec2026/workshops/ <https://softconf.com/lrec2026/workshops/>). Note that submissions should essentially be ready to be turned into a Call for Workshop Papers within one week of notification of acceptance (see Important dates above). The proposals should be at most two pages for the main proposal + at most two additional pages for information about organisers, program committee, and references. Thus, the whole proposal should not be more than FOUR pages long, excluding references. The two pages for the main proposal must include: * A title and a brief description of the workshop topic and content. * Workshops can be half-day (morning 9:00 to 13:00 or afternoon 14:00 to 18:00) or full-day (9:00 to 18:00) and must follow fixed hours for breaks (morning coffee break 10.30-11.00, lunch break: 13:00-14:00, afternoon coffee break: 16.00-16.30). * A list of invited speakers, if applicable, with an indication of which ones have already agreed and which are tentative, and sources of funding for the speakers, if needed. * An estimate of the number of attendees. * A description of any shared tasks associated with the workshop, and estimate of the number of participants. Note that any shared task will also need to be reviewed by the workshop committee for ethical concerns. * A description of special requirements and technical needs, where relevant. * If the workshop has been held before, a note specifying where previous iterations of the workshops were held, how many submissions the workshop received, how many papers were accepted (also specify if they were not regular papers, e.g., shared task system description papers, non-archival papers), and how many attendees the workshop attracted. The two pages for information about the workshop, the organisers and the program committee must include: * A very brief advertisement or tagline for the workshop, up to 140 characters, that highlights any key information you wish prospective attendees to know, and which would be suitable to be put onto a web-based survey (see below). * The names, affiliations, and email addresses of the organisers, with one-paragraph statements of their research interests, areas of expertise, and experience in organising workshops and related events. * A list of Program Committee members, with an indication of which members have already agreed. Organisers should do their best to estimate the number of submissions (especially for recurring workshops) in order to (a) ensure a sufficient number of reviewers so that each paper receives 3 reviews, and (b) anticipate that no one is committed to reviewing more than 3 papers. This practice is likely to ensure on-time, and more thorough and thoughtful reviews. Submission is electronic. Please submit the proposals using the START system at this URL:https://softconf.com/lrec2026/workshops/ <https://softconf.com/lrec2026/workshops/> EVALUATION CRITERIA The workshop proposals will be evaluated according to their originality and impact, the expected interest level of participants, as well as the quality of the organising team and Program Committee, and their contribution to the diversity of the conference. DIVERSITY AND INCLUSION We particularly encourage submissions of underrepresented groups in language resources and language technology, including researchers from any demographic or geographic minority, with disabilities, or others. In the evaluation of the proposal, we will take these aspects into account to create a varied and balanced set of workshops. Workshop proposals are evaluated on a range of aspects, including diversity, such as (1) how the topic of the workshop contributes to improved diversity and increased fairness in the field, (2) if the topic is particularly relevant for a specific underrepresented group of potential participants, (3), if the presenters are from an underrepresented group. WORKSHOP ORGANISER RESPONSIBILITIES At least one of the accepted organisers must attend the workshop in person. The organisers of the accepted proposals are responsible for publicizing and running the workshop, including reviewing submissions, producing the workshop program and the camera-ready workshop proceedings according to LREC requirements, organising the meeting days, and playing their part to ensure that all participants are aware of LREC’s anti-harassment policy and code of conduct (see https://lrec2026.info/lrec-2026-code-of-conduct/ <https://lrec2026.info/lrec-2026-code-of-conduct/>). It is crucial that organisers commit to all deadlines. In particular, failure to produce the camera-ready proceedings in the correct format on time will lead to the exclusion of the workshop from the unified proceedings and author indexes. Workshop organisers cannot accept submissions for publication that will be (or have been) published elsewhere, although they are free to set their own policies on simultaneous submission and review, as well as to accept additional non-archival presentations CONTACT * Workshop Chairs: lrec2026-workshop-chairs(a)googlegroups.com <mailto:lrec2026-workshop-chairs@googlegroups.com> * General contact: mailto:info@lrec2026.info <mailto:info@lrec2026.info> * More information on LREC 2026: https://lrec2026.info/ <https://lrec2026.info/>

1 0

LREC 2026 Second Call for Papers - Palma de Mallorca, May 11-16, 2026
by info＠elda.org 30 Jul '25

30 Jul '25

[Apologies for multiple postings] SECOND CALL FOR PAPERS LREC 2026 Organized by the ELRA Language Resources Association Palma, Mallorca, Spain 11-16 May 2026 The Fifteenth biennial Language Resources and Evaluation Conference (LREC) will be held at the Palau de Congressos de Palma in Palma, Mallorca, Spain, on 11-16 May 2026. LREC serves as the primary forum for presentations describing the development, dissemination, and use of language resources involving both traditional and recently developed approaches. The scientific program will include invited talks, oral presentations, and poster and demo presentations, as well as a keynote address by the winner of the Antonio Zampolli Prize. Submissions describing all aspects of language resource development and use are invited, including, but not limited to, the following: Language Resource Development * Methods and tools for mono- and multi-lingual language resource development and annotation * Knowledge discovery/representation (knowledge graphs, linked data, terminologies, lexicons, ontologies, etc.) * Resource development for less-resourced/endangered languages * Guidelines, standards, best practices, and models for interoperability Language Resource Use * Use of language resources in systems and applications for any area of language and speech processing * Use of language resources in assistive technologies, support for accessibility * Efficient/low-resource methods for language and speech processing Evaluation * Methodologies and protocols for evaluation and benchmarking of language technologies * Measures for validation of language resources and quality assurance * Usability of user interfaces and dialogue systems * Bias, safety, and user satisfaction metrics * Interpretability/explainability of language models and language and speech processing tools Language Resources and Large Language Models * Language resource development for LLMs (monolingual, multilingual, multimodal) * (Semi-)automatic generation of training data * Training, fine-tuning, adaptation, alignment, and representation learning * Guardrails, filters, and modules for generative AI models Policy and Organizational Considerations * International and national activities, projects, initiatives, and policies * Language coverage and diversity * Replicability and reproducibility * Organisational, economic, ethical, climate, and legal issues Paper Theme Tracks The above topics are organised in 27 main tracks: * T01Applications Involving LRs and Evaluation for any area/domain of language and speech processing * T02Bias, Offensive and Non-inclusive Language; Guardrails, filters * T03Corpora, Treebanks and Annotation; Tools, Systems and Platforms * T04Dialogue, Conversational Systems, Chatbots, Human-Robot Interaction * T05Digital Humanities, Cultural Heritage and Computational Social Science * T06Discourse and Pragmatics * T07Document Classification, Information Retrieval and Cross-lingual Retrieval * T08Ethics, Research Reproducibility and Replicability, and Environmental Issues * T09Evaluation, Validation, Quality Assurance and Benchmarking Methodologies * T10Inference, Reasoning, Question Answering * T11Information Extraction and Text Mining * T12Interpretability/explainability of language models and language and speech processing tools * T13Knowledge discovery/representation (knowledge graphs, linked data, terminologies, lexicons, ontologies, etc.) * T14Language Modeling and LRs (including training, fine-tuning, representation learning, and generation of synthetic data) * T15Less-Resourced/Endangered/Less-studied Languages * T16Lexicon and Semantics * T17Machine Learning Methods and Techniques for Language and Speech Processing, including efficient/low-resource methods * T18Multilinguality, Machine Translation (including Speech-to-Speech) and Translation Aids * T19Multimodality, Cross-modality (including Sign Languages, Vision and Other Modalities), Multimodal Applications, Grounded Language Acquisition * T20Natural Language Generation and Summarization * T21Simplification, Plain Language and Assistive Technologies * T22Opinion & Argument Mining, Sentiment Analysis, Emotion Recognition/Generation * T23Parsing, Tagging, Chunking, Grammar, Syntax, Morphosyntax, Morphology * T24Policy and Legal Issues (including Language Resource Infrastructures, Interoperabillity, Standards for LRs, Metadata) * T25Psycholinguistics, Cognitive Linguistics and Linguistic Theories * T26Social Media Processing * T27Speech Resources and Processing (including Phonetic Databases, Phonology, Prosody, Speech Recognition, Synthesis and Spoken Language Understanding) Separate calls have been issued for Workshops, Tutorials. We will also organise an Industry Track to report on state of the art within industry and commercial achievements, for which there will be a separate Call. Paper Submission and Templates Submission is electronic, using the Softconf START conference management system via the link: https://softconf.com/lrec2026/main/ Submissions should be 4 to 8 pages in length (excluding references and potential Ethics Statements). Submissions should follow the LREC stylesheet, available on the conference website in Authors' Kit page <https://lrec2026.info/authors-kit/> and the overleaf link is here: https://www.overleaf.com/project/6887c0280bfaab6e3e8bd0bc At the time of submission, authors are offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map <https://lremap.elra.info/>, which provides metadata for the resource. Accepted papers will appear in the conference proceedings, which include both oral and poster papers in the same format. Determination of the presentation format (oral vs. poster) is based solely on an assessment of the optimal method of communication (more or less interactive), given the paper content. Author Responsibilities Papers must be of original, previously unpublished work. Papers must be anonymized to support double-blind reviewing. Submissions thus must not include authors’ names and affiliations. The submissions should also avoid links to non-anonymized repositories; the code should be either submitted as supplementary material in the final version of the paper, or as a link to an anonymized repository (e.g., Anonymous GitHub <https://anonymous.4open.science/> or Anonym Share <https://anonymfile.com/>). Papers that do not conform to these requirements will be rejected without review. Papers that have been or will be under consideration for other venues at the same time must be declared at submission time. If a paper is accepted for publication at LREC 2026, it must be immediately withdrawn from other venues. If a paper under review at LREC 2026 is accepted elsewhere and authors intend to proceed there, the LREC 2026 Programme Committee must be notified immediately. Ethics Statement We encourage all authors submitting to LREC 2026 to include an explicit ethics statement on the broader impact of their work, or other ethical considerations after the conclusion but before the references. The ethics statement will not count toward the page limit. Presentation Requirement All papers accepted for the main conference track must be presented at the conference to appear in the proceedings, and at least one author must register for LREC 2026. Papers will be presented either orally or as posters. The specific presentation type of a paper will be decided based on its content, with no difference in quality implied. Papers that include a demonstration component will be presented as posters. The conference will be hybrid, including both on-site and virtual presentations. For hybrid purposes, all authors of papers accepted to the main conference, whether oral or poster, will be required to upload a presentation video and a set of slides, plus the poster PDF, for the authors of an accepted paper as Poster on the Conference Catalysts platform. This material will also be inserted in the LREC 2026 online proceedings. Important dates (All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”) * Oral and poster (or poster+demo) paper submission: 17 October 2025 * Notification of acceptance: 13 February 2026 * Camera Ready due: 6 March 2026 * LREC 2026 conference: 11-16 May 2026 More informationon LREC 2026: https://lrec2026.info/ <https://lrec2026.info/> Contact: lrec2026-pcs(a)googlegroups.com <mailto:lrec2026-pcs@googlegroups.com>

1 0

2026

2025

2024

2023

2022

Corpora July 2025