- Corpora - ELRA lists

1st CFP: 19th Workshop on Building and Using Comparable Corpora (BUCC) at LREC 2026
by Reinhard Rapp 16 Dec '25

16 Dec '25

Call for Papers ************************************************************** 19th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA Co-located with LREC 2026, Palma de Mallorca (in-person & online) May 11, 2026 Paper submission deadline: February 28, 2026 Workshop website: https://comparable.lisn.upsaclay.fr/bucc2026/ Main conference website: https://lrec2026.info/ ************************************************************** MOTIVATION In the language engineering and linguistics communities, research in comparable corpora has been motivated by two main reasons. In language engineering, on the one hand, it is chiefly motivated by the need to use comparable corpora as training data for data-driven NLP applications such as statistical and neural machine translation, or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest because they enable cross-language discoveries and comparisons. It is generally accepted in both communities that comparable corpora consist of documents that are comparable in content and form in various degrees and dimensions across several languages. Parallel corpora are on the one end of this spectrum, and unrelated corpora are on the other. Increasingly, these resources are not only collected, but also augmented or even created synthetically, which raises new questions about how to define and measure comparability. In recent years, the use of comparable corpora for pre-training Large Language Models (LLMs) has led to their impressive multilingual and cross-lingual abilities, which are relevant to a range of applications, including information retrieval, machine translation, cross-lingual text classification, etc. The linguistic definitions and observations related to comparable corpora are crucial to improve methods to mine such corpora, to assess and document synthetic data, and to improve cross-lingual transfer of LLMs. Therefore, it is of great interest to bring together builders and users of such corpora. PANEL DISCUSSION The panel discusses the impact of synthetic data on comparable corpora research. Fundamental questions about how LLMs transform our understanding and use of multilingual data are addressed. TOPICS We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following: Building Comparable Corpora - Automatic and semi-automatic methods, including generating comparable corpora using LLMs - Methods to mine parallel and non-parallel corpora from the web - Tools and criteria to evaluate the comparability of corpora - Parallel vs non-parallel corpora, monolingual corpora - Rare and minority languages, within and across language families - Multi-media/multi-modal comparable corpora Synthetic Data for Comparable Corpora - LLM generation of comparable/parallel data - Improving comparability of synthetic data - Incidental bilingualism & pre-training use of comparable data - Comparability & cross-lingual consistency - Detection & attribution of synthetic vs. human text - English-centric effects & fairness across languages/scripts - Evaluation & reproducibility for downstream tasks Applications of Comparable Corpora - Human translation - Language learning - Cross-language information retrieval & document categorization - Bilingual and multilingual projections - (Unsupervised) machine translation - Writing assistance - Machine learning techniques using comparable corpora Mining from Comparable Corpora - Cross-language distributional semantics, word embeddings and pre-trained multilingual transformer models - Extraction of parallel segments or paraphrases from comparable corpora - Methods to derive parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation) - Extraction of bilingual and multilingual translations of single words, multi-word expressions, proper names, named entities, sentences, paraphrases etc. from comparable corpora. - Induction of morphological, grammatical, and translation rules from comparable corpora - Induction of multilingual word classes from comparable corpora Comparable Corpora in the Humanities - Comparing linguistic phenomena across languages in contrastive linguistics - Analyzing properties of translated language in translation studies - Studying language change over time in diachronic linguistics - Assigning texts to authors via authors' corpora in forensic linguistics - Comparing rhetorical features in discourse analysis - Studying cultural differences in sociolinguistics - Analyzing language universals in typological research IMPORTANT DATES 28 Feb 2026: Paper Submission deadline 22 Mar 2026: Notification of acceptance 29 Mar 2026: Camera-ready final papers 14 Apr 2026: Workshop Programme final version 11 May 2026: Workshop date All deadlines are 11:59PM UTC-12:00 (“anywhere on earth”). For updates of the schedule, please see the workshop website. PRACTICAL INFORMATION The workshop is a hybrid event, both in-person and online. Workshop registration is via the main conference registration site, see https://lrec2026.info/ The workshop proceedings will be published in the ACL Anthology (https://aclanthology.org/). SUBMISSION GUIDELINES Please follow the style sheet and templates (for LaTeX, Overleaf and MS-Word) provided for the main conference at https://lrec2026.info/authors-kit/ Papers should be submitted as a PDF file using the START conference manager at https://softconf.com/lrec2026/BUCC2026/ Submissions must describe original and unpublished work and range from 4 to 8 pages plus unlimited references. Reviewing will be double blind, so the papers should not reveal the authors' identity. Accepted papers will be published in the workshop proceedings. Double submission policy: Parallel submission to other meetings or publications is possible but must be notified to the workshop organizers by e-mail immediately upon submission to another venue. For further information and updates, please see the BUCC 2026 web page at https://comparable.lisn.upsaclay.fr/bucc2026/. WORKSHOP ORGANIZERS - Reinhard Rapp (University of Mainz, Germany) - Ayla Rigouts Terryn (Université de Montréal, Mila, Canada) - Serge Sharoff (University of Leeds, United Kingdom) - Pierre Zweigenbaum (Université Paris-Saclay, CNRS, France) Contact: reinhardrapp (at) gmx (dot) de PROGRAMME COMMITTEE - Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences, Iran) - Eleftherios Avramidis (DFKI, Germany) - Gabriel Bernier-Colborne (National Research Council, Canada) - Kenneth Church (VecML.com, USA) - Patrick Drouin (Université de Montréal, Canada) - Alex Fraser (Technical University of Munich, Germany) - Natalia Grabar (CNRS, University of Lille, France) - Amal Haddad Haddad (Universidad de Granada, Spain) - Kyo Kageura (University of Tokyo, Japan) - Natalie Kübler (Université Paris Cité, France) - Philippe Langlais (Université de Montréal, Canada) - Yves Lepage (Waseda University, Japan) - Shervin Malmasi (Amazon, USA) - Michael Mohler (Language Computer Corporation, USA) - Emmanuel Morin (Nantes Université, France) - Dragos Stefan Munteanu (RWS, USA) - Preslav Nakov (Mohamed bin Zayed University of AI, United Arab Emirates) - Ted Pedersen (University of Minnesota, Duluth, USA) - Reinhard Rapp (University of Mainz, Germany) - Ayla Rigouts Terryn (Université de Montréal & Mila, Canada) - Nasredine Semmar (CEA LIST, Paris, France) - Serge Sharoff (University of Leeds, UK) - Richard Sproat (Sakana.ai, Tokyo, Japan) - Marko Tadić (University of Zagreb, Croatia) - François Yvon (CNRS & Sorbonne Université, France) - Pierre Zweigenbaum (Université Paris-Saclay, CNRS, France) INFORMATION ABOUT THE LRE 2026 MAP AND THE "SHARE YOUR LRs!" INITIATIVE When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of the research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

1 0

CfP: NakbaVirality Shared Task - NakbaNLP within LREC 2026
by Saad Ezzini 16 Dec '25

16 Dec '25

Call for Participation: NakbaVirality Shared Task Multimodal and Textual Virality Prediction in High-Stakes Discourse Organized within the second Nakba-NLP Workshop at LREC 2026 https://lrec2026.info/ 11-16 May 2026 Palma de Mallorca, Spain We invite you to participate in the NakbaVirality Shared Task, a new challenge focusing on predicting the reach and engagement of content surrounding the Nakba and the post-October 7th war on Gaza. This task operates at the intersection of NLP, Computer Vision, and Computational Social Science, aiming to understand information diffusion in highly polarized and emotionally charged contexts. Website: https://ezzini.github.io/NakbaVirality/ Registration: https://forms.gle/ufj2gqRyMrrdDs5f9 Motivation Understanding what makes a post "go viral" in conflict zones is critical for analyzing propaganda spread, public sentiment, and information maneuvering. This shared task challenges participants to model "virality" not just as a number, but as a function of nuanced text, graphic imagery, and deep historical context. Tasks We propose two distinct tasks: Task 1: Multimodal Virality Classification * Goal: Classify posts into Low, Medium, or High virality buckets. * Input: Text + Image. * Challenge: Aligning mismatched modalities (e.g., peaceful image vs. violent interaction) and handling "dog whistles." * Metric: Macro-F1 Score. Task 2: Textual Virality and Interaction Prediction (Regression) * Goal: Predict distinct Likability (agreement) and Interactivity (controversy/engagement) scores. * Input: Text only. * Challenge: Distinguishing between content that is "liked" versus content that provokes "debate." * Metrics: Pearson Correlation (r) and MSE. Data * Sources: X (Twitter) and Reddit. * Size: ~5,000 anonymized samples (post-Oct 2023). * Content: Posts related to "Gaza," "Nakba," "Palestine," "Israel," etc. Important Dates * Jan 1: Release of Training Data (3,500 samples) * Feb 1: Release of Development Data (500 samples) * Feb 15: Evaluation Period Begins * Feb 20: Evaluation Period Ends * Mar 1: Paper Submission Deadline Participation & Submission * Participation is free. * Teams must verify their results by submitting a system description paper (max 4 pages). * Papers will be published in the proceedings (ACLAnthology). * Detailed guidelines: https://ezzini.github.io/NakbaVirality/guidelines Organizers * Saad Ezzini, King Fahd University of Petroleum and Minerals * Salima Lamsiyah, University of Luxembourg * Shadi Abudalfa, King Fahd University of Petroleum & Minerals * Samir El-Amrany, University of Luxembourg * Walid Alsafadi, University College of Applied Sciences Gaza For more information, please visit our website https://ezzini.github.io/NakbaVirality/ or contact us at saad.ezzini(a)kfupm.edu.sa<mailto:saad.ezzini@kfupm.edu.sa> ********************************************************************** DISCLAIMER: The information in this email and its attachments (if any) is intended for the addressee only and may contain confidential or privileged information. If you are not the intended recipient, please delete the email and its attachments from your system and notify the sender immediately. You should not retain, disclose, copy, or use this email or any of its contents for any purpose, nor disclose its contents to any other person. KFUPM is not responsible for changes made to this message after it was sent. Statements and opinions expressed in this e-mail are those of the sender, and do not necessarily reflect those of KFUPM. KFUPM is not liable for any effect or virus damage caused by this message. إن المعلومات الواردة في هذا البريد الإلكتروني ومرفقاته إن وجدت، قد تكون خاصة أو سرية؛ فإذا لم تكن المقصود بهذه الرسالة؛ فيُرجى منك حذفها ومرفقاتها من نظامك وإخطار المرسل بخطأ وصولها إليك فورا. كما لا يجوز نسخ أي جزء منها أو مرفقاتها ، أو الإفصاح عن محتوياتها لأي شخص أو استعمالها لأي غرض آخر. إن جامعة الملك فهد للبترول والمعادن لا تتحمل مسؤولية التغييرات التي يتم إجراؤها على هذه الرسالة بعد إرسالها. وإن البيانات أو الآراء المعبر عنها في هذا البريد، هي بيانات تخص مُرسلها، ولا تعكس بالضرورة رأي وبيانات الجامعة. كما لا تتحمل الجامعة مسؤولية أي تأثير ينتج عن هذه الرسالة أوعن أي فيروس قد تحمله.

1 0

[CfP] Knowledge Graphs and Large Language Models (KG–LLM 2026) @ LREC 2026
by Gilles Sérasset 16 Dec '25

16 Dec '25

Knowledge Graphs and Large Language Models (KG–LLM 2026) @ LREC 2026 We are pleased to announce the Workshop on Knowledge Graphs and Large Language Models (KG–LLM 2026), to be held in conjunction with LREC 2026 in Palma de Mallorca, Spain, May 16th 2026. We invite submissions of original research that leverages both Knowledge Graphs (KGs) and Large Language Models (LLMs) in any domain of Natural Language Processing or language resource development. More information at https://kg-llm.github.io/ Workshop Overview Large Language Models have become foundational in NLP, yet they continue to face challenges related to bias, hallucination, explainability, environmental impact, and the cost of training. Knowledge Graphs, in contrast, provide high-quality, interpretable, and reusable ontological and linguistic structures that support reasoning, fact checking, and knowledge preservation. The goal of this workshop is to bring together researchers working at the intersection of these two paradigms, exploring how explicit knowledge and implicit statistical learning can enhance each other. We welcome contributions that investigate, demonstrate, or evaluate systems, methods, or resources integrating both KGs and LLMs. Topics of Interest We encourage submissions on (but not limited to): 1. LLMs for Knowledge Graph Engineering KG modelling, resource creation, and interlinking Relation extraction Corpus annotation Ontology localization Creation or expansion of linguistic or knowledge graphs KG querying and question answering 2. Knowledge Graphs for Large Language Models Using linguistic or knowledge graphs as training data Fine-tuning LLMs using linked linguistic (meta)data Knowledge/linguistic graph embeddings KGs for model explainability, provenance, and source attribution Neural models for under-resourced languages KG-augmented RAG (KG-RAG) 3. Joint Use of KGs and LLMs in Applications Combined KG–LLM use cases with structured linguistic data Digital humanities applications Question answering over graph data Fake news and misinformation detection Educational applications and assisted learning Visualizing academic writing with KGs and LLMs KG-enhanced chatbots for health and medical contexts Application Domains All application domains are welcome (Digital Humanities, FinTech, Linguistics, Education, Cybersecurity, etc.) as long as the work uses both Knowledge Graphs and Large Language Models. Submission Guidelines Submission Format: Papers up to 8 pages excluding references. Style: All submissions must follow the LREC 2026 format and use the official LREC author kit. (available at https://lrec2026.info/authors-kit/ ) Review Process: Double-blind peer review. Submissions must be fully anonymized. Submission System: Papers must be submitted via the START conference system at https://softconf.com/lrec2026/KGLLM/ Language Resources: In line with LREC policies, authors are encouraged to describe, document, and share language resources, datasets, models, evaluation tools, or annotation guidelines used or created in their work. Accepted Papers: All accepted papers will be included in the LREC 2026 workshop proceedings. Presentation: Accepted papers will be presented as oral or poster sessions during the workshop. Important Dates *All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”)* Paper submission deadline: 26 February 2026 Notification to authors: 24 March 2026 Camera-ready due: 30 March 2026 Workshop date: 16 May 2026 Contact For questions, please contact the workshop organizers at: kg-llm-26(a)googlegroups.com Organizing Committee Gilles Sérasset, Université Grenoble Alpes, France Katerina Gkirtzou, Athena Research Center, Greece Michael Cochez, Ellis Institute Finland & Åbo Akademi, Finland Jan-Christoph Kalo, University of Amsterdam, Netherlands

1 0

The Fifth Generation, Evaluation & Metrics Workshop (GEM): 1st call for papers
by Simon Mille 16 Dec '25

16 Dec '25

Dear colleagues, Please find below information about the upcoming GEM workshop! Event Type: Call for Papers Conference: GEM at ACL 2026 Date: July 2nd or July 3rd, 2026 Location: San Diego, California, USA Website: https://gem-workshop.com/ Contact: gem-workshop-chairs(a)googlegroups.com ------------------------------ Overview The fifth edition of the Natural Language Generation, Evaluation, and Metrics (GEM) Workshop will be at ACL 2026 in San Diego! Evaluation of language models has grown to be a central theme in NLP research, while remaining far from solved. As LMs have become more powerful, errors have become tougher to spot and systems harder to distinguish. Evaluation practices are evolving rapidly—from living benchmarks like Chatbot Arena to LMs being used as evaluators themselves (e.g., LM as judge, autoraters). Further research is needed to understand the interplay between metrics, benchmarks, and human-in-the-loop evaluation, and their impact in real-world settings Topics of Interest We welcome submissions related to, but not limited to, the following topics: - Automatic evaluation of generation systems, including the use of LMs as evaluators - Creating evaluation corpora, challenge sets, and living benchmarks - Critiques of benchmarking efforts, including contamination, memorization, and validity - Evaluation of cutting-edge topics in LM development, including long-context understanding, agentic capabilities, reasoning, and more - Evaluation as measurement beyond raw capability, including ideas such as robustness, reliability, and more - Multimodal evaluation across text, vision, and other modalities - Cost-aware and efficient evaluation methods applicable across languages and scenarios - Human evaluation and its role in the era of powerful LMs - Evaluation of sociotechnical systems employing large language models - Surveys and meta-assessments of evaluation methods, metrics, and benchmarks - Best practices for dataset and benchmark documentation - Industry applications of the above-mentioned topics, especially internal benchmarking or navigating the gap between academic metrics and real-world impact. Special TracksOpinion and Statement Papers Track (New!) We are introducing a special track for opinion and statement papers. These submissions will be presented in curated panel discussions, encouraging open dialogue on emerging topics in evaluation research. We welcome bold, thought-provoking position papers that challenge conventional wisdom, propose new directions for the field, or offer critical perspectives on current evaluation practices. This track is an opportunity to spark discussion and debate—submissions need not present new empirical results but should offer well-argued viewpoints supported by scientific evidence (e.g. prior studies) that advance our collective thinking about evaluation. ReproNLP The ReproNLP Shared Task on Reproducibility of Evaluations in NLP has been run for six consecutive years (2021–2026). ReproNLP 2026 will be part of the GEM Workshop at ACL 2026 in San Diego. It aims to (i) shed light on the extent to which past NLP evaluations have been reproducible, and (ii) draw conclusions regarding how NLP evaluations can be designed and reported in order to increase reproducibility. Participants submit reports for their reproductions of human evaluations from previous NLP literature where they quantitatively assess the degree of reproducibility using methods described in Belz. (2025). More details can be found in the first call for participation for ReproNLP 2026 at https://repronlp.github.io. Workshop Format We aim to organize the workshop in an inclusive, highly interactive, and discussion-driven format. Paper presentations will focus on themed poster sessions that allow presenters to interact with researchers from varied backgrounds and similar interests. The workshop will feature panels on emerging topics and multiple short keynotes by leading experts. 🎭 GEM Comic-Con Edition! In the spirit of San Diego's famous Comic-Con (July 23-26), this year's GEM will be a special Comic-Con edition! We encourage participants to embrace creativity! Whether that’s through themed poster designs, comic-style slides, or dressing up as your favorite evaluation metric personified, we want this year's workshop to be memorable and fun! Submission Types Submissions can take any of the following forms: - Archival Papers: Original and unpublished work, for all the following tracks—Main, ReproNLP, and Opinion/Statement. - Non-Archival Extended Abstracts: Work already presented or under review at a peer-reviewed venue. This is an excellent opportunity to share recent or ongoing work with the GEM community without precluding future publication. - Findings Papers: We additionally welcome presentation of relevant papers accepted to Findings, and will share more information at a later date. All accepted papers will be given up to an additional page to address reviewers comments. Submission Guidelines - Papers to be reviewed should be submitted directly through OpenReview, selecting the appropriate track, and conform to ACL 2026 style guidelines - Review requirement: For each submitted paper, authors may be asked to provide 2 reviews (either one author doing 2 reviews, or two authors each doing one review) - Length. - Archival papers should be within 4–8 pages, and opinion/statement papers should be within 2–4 pages. We make no “Short” or “Long” paper distinctions; we advise authors to tailor their submission length proportional to their contribution. - Extended abstracts should be within 1–2 pages. - Opinion/Statement Papers: These should be titled with the “Position:” prefix. - Dual submission: Dual submission of archival papers is not allowed. Authors interested in presenting work submitted to a different venue should instead use the non-archival extended abstract track. Important Dates - March 19, 2026: Direct paper submission deadline - April 9, 2026: Pre-reviewed ARR commitment deadline - April 28, 2026: Notification of acceptance - May 14, 2026: Camera-ready paper due - June 4, 2026: Pre-recorded video due (hard deadline) - July 2–3, 2026: Workshop at ACL in San Diego Contact For any questions, please check the workshop page or email the organisers: gem-workshop-chairs(a)googlegroups.com *Dublin City University* *Simon Mille *| Postdoctoral Research Fellow ADAPT Centre School of Computing Dublin City University Dublin 9 Ireland www.adaptcentre.ie *ADAPT Taighde Éireann – Research Ireland Centre for Digital Content Technology* Privileged/confidential information: This e-mail and any files transmitted with it are confidential and are intended solely for use by the addressee. Please note that electronic mail to, from or within the College, may be the subject of a request under the Freedom of Information Act. <https://adaptcentre.us8.list-manage.com/subscribe?u=1bed8296322695727ba3517…>

1 0

CLARIN ERIC Newsflash: December 2025
by Gorgaini, E. (Elisa) 16 Dec '25

16 Dec '25

Dear colleagues, The December edition of the CLARIN Newsflash is out. Highlights include: * A review of 2025 with key achievements and milestones * Direct links to the Annual Conference recordings and an impression video, perfect inspiration if you are considering to submit your abstracts for CLARIN2026 * An overview of CLARIN’s presence at LREC including co-organised workshops Additionally, the call for extended abstracts for CLARIN 2026 is now open. Submission deadline 6 April 2026. Link to the call: https://www.clarin.eu/content/call-extended-abstracts-clarin-annual-confere… Read the newsletter: https://www.clarin.eu/content/clarin-newsflash-december-2025 Subscribe to the newsletter here<http://eepurl.com/bOt3Qn>. Kind regards, CLARIN ERIC --- Elisa Gorgaini Communication Officer - CLARIN ERIC Utrecht University | Drift 10, 3512 BS Utrecht, The Netherlands e.gorgaini(a)uu.nl<mailto:e.gorgaini@uu.nl> | elisa(a)clarin.eu<mailto:elisa@clarin.eu> www.clarin.eu<https://www.clarin.eu>

1 0

Call for Shared Task Proposals: ArgMining 2026
by Musi, Elena 16 Dec '25

16 Dec '25

We invite submissions of shared task proposals for ArgMining 2026, the 13th Workshop on Argument Mining and Reasoning, co-located with ACL 2026 (San Diego). Background Argument mining (also known as argumentation mining) is a well-established area in computational linguistics focusing on the automatic identification of argumentative structures such as premises, conclusions, and inference schemes. The field has historically emphasized the development of large-scale datasets and tasks including argument quality assessment, argument persuasiveness, and argumentative text synthesis across domains such as legal, social, medical, political, and scientific settings. In line with broader advances in CL and NLP, recent work has expanded toward explainable argumentation, multimodal settings, and modeling human label variation. Previous editions of ArgMining have promoted shared tasks to advance research on specific aspects of argument mining, including: * Multimodal argumentative fallacy detection: https://nlp-unibo.github.io/mm-argfallacy/2025/ Dialogical argument mining: http://dialam.arg.tech/ ArgMining 2026 Shared Tasks Following the success of prior workshops, ArgMining 2026 plans to feature one or more shared tasks addressing unsolved problems for the community to investigate. In keeping with this year’s special theme—“Understanding and evaluating arguments in both human and machine reasoning”—we particularly encourage proposals aligned with this focus. What to Include in a Proposal Shared task proposals should include: * Title and brief task description * Description of the datasets to be used and their readiness * Previous work on the datasets, including relevant publications (if any) * A short description of the evaluation methodology for submitted systems * Brief introduction of the task organizers * Anticipated timeline, including dates for dataset releases and final evaluation How to Submit Submit your shared task proposal via email to: argmining.org [at] gmail.com * Submission deadline: December 22, 2025 * Notification of acceptance: beginning/mid January 2026 Tentative Shared Task Schedule * Mid January: Training data release * Early March: Test data release; evaluation start * Mid/late March: Evaluation end * Early April: Results announcement * Mid April: Paper submission deadline * Mid May: Camera-ready deadline * July: ArgMining 2026 workshop (at ACL) Organizers Mohamed Elaraby (University of Pittsburgh) Annette Hautli-Janisz (University of Passau) John Lawrence (University of Dundee) Elena Musi (University of Liverpool) Julia Romberg (GESIS) Federico Ruggeri (University of Bologna)

1 0

ARHAHA 2026
by Sharefah A. Al Ghamdi 16 Dec '25

16 Dec '25

We’re excited to invite you to take part in ARHAHA 2026, a shared task on Arabic Humor Generation, hosted at OSACT7 and co-located with LREC 2026. Description: Participants will develop systems that generate original, safe, and culturally appropriate humorous content in Arabic under a set of carefully designed constraints. The task aims to push models beyond memorization and towards genuine humorous creativity. Humor Generation Task This task focuses on building and evaluating systems that generate short humorous texts in Arabic given constrained prompts. Task Summary Input: A pair of Arabic words Output: A short humorous Arabic text (maximum 100 characters) Evaluation: Automated format and constraint validation Human evaluation of humor quality, originality, fluency, and cultural appropriateness The website for the shared task is: https://sites.google.com/view/arhaha2026/home How to Participate? Registration is required, please complete the registration form. Join the ARAHAHA at Slack workspace. System Description Papers All participating teams are encouraged to submit a short system description paper. Papers will be included in the workshop proceedings and do not require high leaderboard ranking. We welcome creative approaches, analysis, and lessons learned. Contact For questions or clarifications, please contact the organizing team at arhaha2026(a)gmail.com We look forward to your participation and contributions! Best regards, The ARHAHA 2026 Organizing Team ________________________________ Disclaimer: This communication is intended for the above named person and is confidential and / or legally privileged. Any opinion(s) expressed in this communication are not necessarily those of KSU (King Saud University). If it has come to you in error you must take no action based upon it, nor must you print it, copy it, forward it, or show it to anyone. Please delete and destroy the e-mail and any attachments and inform the sender immediately. Thank you. KSU is not responsible for the political, religious, racial or partisan opinion in any correspondence conducted by its domain users. Therefore, any such opinion expressed, whether explicitly or implicitly, in any said correspondence is not to be interpreted as that of KSU. KSU may monitor all incoming and outgoing e-mails in line with KSU business practice. Although KSU has taken steps to ensure that e-mails and attachments are free from any virus, we advise that, in keeping with best business practice, the recipient must ensure they are actually virus free.

1 0

Call for Participation in Touché @ CLEF 2026: Shared Tasks on Argumentation Systems (Fallacies, Causality, Generalizability, Advertisements)
by Johannes.Kiesel＠gesis.org 15 Dec '25

15 Dec '25

Touché @ CLEF 2026: Shared Tasks on Argumentation Systems (Fallacies, Causality, Generalizability, Advertisements) Call for Participation We invite you to participate in the following shared tasks at Touché 2026 held in conjunction with the CLEF conference. 1. Fallacy Detection. Given an argument, determine whether it is fallacious and what type of fallacy it is. https://touche.webis.de/clef26/touche26-web/fallacy-detection.html 2. Causality Extraction. Given a text, determine whether it contains causal information, identify the information in the text, and classify the expressed relationship. https://touche.webis.de/clef26/touche26-web/causality-extraction.html 3. Generalizability of Argument Identification in Context. Given a sentence from some argumentation dataset, determine whether the sentence was annotated as argument (using annotator guidelines etc.). https://touche.webis.de/clef26/touche26-web/generalizable-argument-mining.h… 4. Advertisement in Retrieval-Augmented Generation (RAG). Given a query and response of an RAG system, determine whether the response contains an ad, identify the ad in the response, and remove the ad. https://touche.webis.de/clef25/touche25-web/advertisement-detection.html Find out more at https://touche.webis.de/clef26/touche26-web/ and join our mailing list at https://groups.google.com/g/touche-lab for staying up to date. Awards -------------------------- The best submission for each task will receive an award. In addition, our partners at Methods Hub (https://methodshub.gesis.org/) have agreed to provide priority support to the award-winning teams in developing their submissions into fully reusable software packages to maximize your impact. Important Dates -------------------------- 2026-05-07: Approaches submission deadline 2026-05-28: Participant paper submission 2026-06-30: Peer review notification 2026-07-06: Camera-ready participant papers submission 2026-09 21-24: CLEF Conference in Jena and Touché Workshop Links -------------------------- Touché: https://touche.webis.de Contact: touche(a)webis.de We are looking forward to your submission! The Touché team

1 0

Deadline extended: VarDial 2026 @ EACL – The Thirteenth Workshop on NLP for Similar Languages, Varieties and Dialects
by verena.blaschke＠cis.lmu.de 15 Dec '25

15 Dec '25

Dear all, This is the last CfP for VarDial 2026 - The Thirteenth Workshop on NLP for Similar Languages, Varieties and Dialects. We have extended the submission deadlines (January 2 for direct submissions, January 10 for committing pre-reviewed submissions), see details below. Apologies for cross-posting! -- VarDial 2026: https://sites.google.com/view/vardial-2026/ VarDial 2026 will be colocated with EACL 2026 in Rabat, Morocco. We anticipate a discussion on computational methods and language resources for closely related languages, language varieties, and dialects. We welcome papers dealing with one or more of the following topics: - Language resources and tools for similar languages, varieties and dialects; - Evaluation of language resources and tools applied to non-dominant language varieties; - Cross-lingual transfer and adaptation of models to similar languages, varieties and dialects; - Automatic identification of lexical variation; - Automatic classification of language varieties; - Machine translation between closely-related languages, language varieties and dialects; - Corpus-driven studies in dialectology and language variation; - Computational approaches to mutual intelligibility between dialects and similar languages; - Text similarity and adaptation between language varieties; - Linguistic issues in the adaptation of language resources and tools (e.g., cognate detection, semantic discrepancies, lexical gaps, false friends); - Studies focusing on related creole languages and their lexifier languages; - Studies focusing on diachronic language variation (e.g. phylogenetic methods, historical dialects). In addition to the topics listed above, we also welcome papers dealing with diachronic language variation (e.g. phylogenetic methods, historical dialects). Instructions for Authors Submissions should be formatted according to the ACL Rolling Review template and submitted as a PDF. The review process will be double-blind. More information is on the website (https://sites.google.com/view/vardial-2026/). Important Dates - Direct Submission deadline: January 2, 2025 (updated!) - Pre-reviewed (ARR) submission deadline: January 10, 2026 (updated!) - Notification of acceptance: January 23, 2026 - Camera-ready paper due: February 3, 2026 - Workshop at EACL (hybrid): March 24-29, 2026 (exact date TBD) Shared Task: Arabic Modeling In Your Accent (AMIYA) VarDial 2026 will have a shared task on language modelling for dialectal Arabic (DA), where participants can contribute LLMs trained or adapted for DA. These will be evaluated using the AL-QASIDA benchmark (Robinson et al., 2025), an evaluation suite that comprehensively measures an LLM’s dialectal fidelity, understanding, generation quality, and MSA-DA diglossia in DA. More information: https://sites.google.com/view/vardial-2026/shared-tasks - Training data release: November 30, 2025 - Registration deadline, eval data finalized: December 15, 2025 - System submission deadline: January 10, 2025 - System description paper deadline: January 20, 2025 Workshop Organizers Yves Scherrer – University of Oslo (Norway) Noëmi Aepli – University of Pennsylvania (USA) Verena Blaschke – LMU Munich and Munich Center for Machine Learning (Germany) Tommi Jauhiainen – University of Helsinki (Finland) Nikola Ljubešić – Jožef Stefan Institute and University of Ljubljana (Slovenia) Preslav Nakov – Mohamed bin Zayed University of Artificial Intelligence (UAE) Jörg Tiedemann – University of Helsinki (Finland) Marcos Zampieri – George Mason University (USA) Contact: yves.scherrer(a)ifi.uio.no or verena.blaschke(a)cis.lmu.de

1 0

December 2025 Newsletter - LDC
by Penn LDC 15 Dec '25

15 Dec '25

In this newsletter: LDC 2026 membership discounts now available LDC's 1000th corpus Approaching deadline for Spring 2026 data scholarship applications LDC closed for Winter Break December 25 - January 2 New publications: 2021 NIST Speaker Recognition Evaluation Development and Test Set<https://catalog.ldc.upenn.edu/LDC2025S11> LORELEI Sinhala Incident Language Pack<https://catalog.ldc.upenn.edu/LDC2025T17> ________________________________ LDC 2026 membership discounts now available Now through March 2, 2026, any organization that joins the Consortium or renews their membership will receive a 10% discount off the 2026 membership fee. Membership remains the most economical way to access current and past LDC releases. Consult Join LDC<https://www.ldc.upenn.edu/members/join-ldc> for details on membership options and benefits. LDC's 1,000th corpus LDC is delighted to announce the release of the 1,000th corpus into the Catalog! This milestone represents the commitment we made over thirty years ago to provide large quantities of diverse data, robust research program support, and exceptional member services. We are grateful for the continued support and collaboration of our members, friends, and the community. Approaching deadline for Spring 2026 data scholarship applications Attention students: don't miss out on the chance to receive no-cost access to LDC data for your research. Applications for Spring 2026 data scholarships are due January 15, 2026. For more information on requirements and program rules, see LDC Data Scholarships<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>. LDC closed for Winter Break December 25-January 2 LDC will be closed from Thursday, December 25, 2025, through Friday, January 2, 2026, in accordance with the University of Pennsylvania Winter Break Policy. Our offices will reopen on Monday, January 5, 2026. Requests received by the Membership Office during Winter Break will be processed when the office reopens. ________________________________ New publications: 2021 NIST Speaker Recognition Evaluation Test Set<https://catalog.ldc.upenn.edu/LDC2025S11> was developed by LDC and NIST (National Institute of Standards and Technology). It contains approximately 447 hours of Cantonese, Mandarin, and English conversational telephone speech, audio from video, and selfie image data for development and test, along with answer keys, enrollment, trial files, and documentation from the NIST-sponsored 2021 Speaker Recognition Evaluation (SRE)<https://www.nist.gov/itl/iad/mig/nist-2021-speaker-recognition-evaluation-s…>. The SRE task is speaker detection, that is, to determine whether a specified target speaker was speaking during a segment of speech. SRE21 focused on telephone speech and audio from video and included close-up images of participants. The evaluation also featured cross-lingual trials, that is, enrollment and test segments spoken in different languages. The data was drawn from the WeCanTalk corpus collected by LDC in which speakers called friends or relatives who agreed to record their telephone conversations lasting between 8-10 minutes. Subjects contributed multiple conversational telephone speech recordings and audio recordings in which they were talking, plus a single selfie image. Recordings were manually audited to verify speaker, language, and quality. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * LORELEI Sinhala Incident Language Pack<https://catalog.ldc.upenn.edu/LDC2025T17> was developed by LDC and is comprised of 8.1 million words of Sinhala monolingual text, 700,00 words of English monolingual text, 6.4 million words of parallel Sinhala- English text, and 50,000 words annotated for entity discovery and linking and situation frames. It constitutes all of the text data, annotations, supplemental resources, and related software tools for the Sinhala language used in the DARPA LORELEI / LoReHLT 2018 Evaluation<https://www.nist.gov/itl/iad/mig/lorehlt-evaluations>. The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. In the evaluation scenario, an unforeseen event triggered a need for humanitarian and logistical support in a region where the incident language had received little or no attention in NLP research. Evaluation participants provided NLP solutions, including information extraction and machine translation, with limited resources and limited development time. Data was collected from news, social network, weblog, newsgroup, discussion forum, and reference material. Entity discovery and linking annotation identified entities to be detected by systems for scoring purposes. Situation frame analysis was designed to extract basic information about needs and relevant issues for planning a disaster response effort. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

2026

2025

2024

2023

2022

Corpora