Call for Papers
**************************************************************
19th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA
Co-located with LREC 2026, Palma de Mallorca (in-person & online)
May 11, 2026
Paper submission deadline: February 28, 2026
Workshop website: https://comparable.lisn.upsaclay.fr/bucc2026/
Main conference website: https://lrec2026.info/
**************************************************************
MOTIVATION
In the language engineering and linguistics communities, research
in comparable corpora has been motivated by two main reasons. In
language engineering, on the one hand, it is chiefly motivated by
the need to use comparable corpora as training data for data-driven
NLP applications such as statistical and neural machine translation, or
cross-lingual retrieval. In linguistics, on the other hand, comparable
corpora are of interest because they enable cross-language discoveries
and comparisons. It is generally accepted in both communities that
comparable corpora consist of documents that are comparable in content
and form in various degrees and dimensions across several languages.
Parallel corpora are on the one end of this spectrum, and unrelated
corpora are on the other. Increasingly, these resources are not only
collected, but also augmented or even created synthetically, which
raises new questions about how to define and measure comparability.
In recent years, the use of comparable corpora for pre-training Large
Language Models (LLMs) has led to their impressive multilingual and
cross-lingual abilities, which are relevant to a range of applications,
including information retrieval, machine translation, cross-lingual text
classification, etc. The linguistic definitions and observations related
to comparable corpora are crucial to improve methods to mine such corpora,
to assess and document synthetic data, and to improve cross-lingual transfer
of LLMs. Therefore, it is of great interest to bring together builders and
users of such corpora.
PANEL DISCUSSION
The panel discusses the impact of synthetic data on comparable corpora
research. Fundamental questions about how LLMs transform our understanding
and use of multilingual data are addressed.
TOPICS
We solicit contributions on all topics related to comparable (and parallel)
corpora, including but not limited to the following:
Building Comparable Corpora
- Automatic and semi-automatic methods, including generating
comparable corpora using LLMs
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, within and across language families
- Multi-media/multi-modal comparable corpora
Synthetic Data for Comparable Corpora
- LLM generation of comparable/parallel data
- Improving comparability of synthetic data
- Incidental bilingualism & pre-training use of comparable data
- Comparability & cross-lingual consistency
- Detection & attribution of synthetic vs. human text
- English-centric effects & fairness across languages/scripts
- Evaluation & reproducibility for downstream tasks
Applications of Comparable Corpora
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
Mining from Comparable Corpora
- Cross-language distributional semantics, word embeddings and
pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide
for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words,
multi-word expressions, proper names, named entities, sentences,
paraphrases etc. from comparable corpora.
- Induction of morphological, grammatical, and translation rules from
comparable corpora
- Induction of multilingual word classes from comparable corpora
Comparable Corpora in the Humanities
- Comparing linguistic phenomena across languages in contrastive linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research
IMPORTANT DATES
28 Feb 2026: Paper Submission deadline
22 Mar 2026: Notification of acceptance
29 Mar 2026: Camera-ready final papers
14 Apr 2026: Workshop Programme final version
11 May 2026: Workshop date
All deadlines are 11:59PM UTC-12:00 (“anywhere on earth”).
For updates of the schedule, please see the workshop website.
PRACTICAL INFORMATION
The workshop is a hybrid event, both in-person and online. Workshop
registration is via the main conference registration site, see
https://lrec2026.info/
The workshop proceedings will be published in the ACL Anthology
(https://aclanthology.org/).
SUBMISSION GUIDELINES
Please follow the style sheet and templates (for LaTeX, Overleaf and
MS-Word) provided for the main conference at
https://lrec2026.info/authors-kit/
Papers should be submitted as a PDF file using the START conference
manager at https://softconf.com/lrec2026/BUCC2026/
Submissions must describe original and unpublished work and range from 4
to 8 pages plus unlimited references. Reviewing will be double blind, so
the papers should not reveal the authors' identity. Accepted papers will
be published in the workshop proceedings.
Double submission policy: Parallel submission to other meetings or
publications is possible but must be notified to the workshop organizers
by e-mail immediately upon submission to another venue.
For further information and updates, please see the BUCC 2026 web page
at https://comparable.lisn.upsaclay.fr/bucc2026/.
WORKSHOP ORGANIZERS
- Reinhard Rapp (University of Mainz, Germany)
- Ayla Rigouts Terryn (Université de Montréal, Mila, Canada)
- Serge Sharoff (University of Leeds, United Kingdom)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, France)
Contact: reinhardrapp (at) gmx (dot) de
PROGRAMME COMMITTEE
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences, Iran)
- Eleftherios Avramidis (DFKI, Germany)
- Gabriel Bernier-Colborne (National Research Council, Canada)
- Kenneth Church (VecML.com, USA)
- Patrick Drouin (Université de Montréal, Canada)
- Alex Fraser (Technical University of Munich, Germany)
- Natalia Grabar (CNRS, University of Lille, France)
- Amal Haddad Haddad (Universidad de Granada, Spain)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan)
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (RWS, USA)
- Preslav Nakov (Mohamed bin Zayed University of AI, United Arab Emirates)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Reinhard Rapp (University of Mainz, Germany)
- Ayla Rigouts Terryn (Université de Montréal & Mila, Canada)
- Nasredine Semmar (CEA LIST, Paris, France)
- Serge Sharoff (University of Leeds, UK)
- Richard Sproat (Sakana.ai, Tokyo, Japan)
- Marko Tadić (University of Zagreb, Croatia)
- François Yvon (CNRS & Sorbonne Université, France)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, France)
INFORMATION ABOUT THE LRE 2026 MAP AND THE "SHARE YOUR LRs!" INITIATIVE
When submitting a paper from the START page, authors will be asked to
provide essential information about resources (in a broad sense, i.e.
also technologies, standards, evaluation kits, etc.) that have been used
for the work described in the paper or are a new result of the research.
Moreover, ELRA encourages all LREC authors to share the described LRs
(data, tools, services, etc.) to enable their reuse and replicability of
experiments (including evaluation ones).
Call for Participation: NakbaVirality Shared Task
Multimodal and Textual Virality Prediction in High-Stakes Discourse
Organized within the second Nakba-NLP Workshop at LREC 2026
https://lrec2026.info/
11-16 May 2026
Palma de Mallorca, Spain
We invite you to participate in the NakbaVirality Shared Task, a new challenge focusing on predicting the reach and engagement of content surrounding the Nakba and the post-October 7th war on Gaza. This task operates at the intersection of NLP, Computer Vision, and Computational Social Science, aiming to understand information diffusion in highly polarized and emotionally charged contexts.
Website: https://ezzini.github.io/NakbaVirality/
Registration: https://forms.gle/ufj2gqRyMrrdDs5f9
Motivation
Understanding what makes a post "go viral" in conflict zones is critical for analyzing propaganda spread, public sentiment, and information maneuvering. This shared task challenges participants to model "virality" not just as a number, but as a function of nuanced text, graphic imagery, and deep historical context.
Tasks
We propose two distinct tasks:
Task 1: Multimodal Virality Classification
* Goal: Classify posts into Low, Medium, or High virality buckets.
* Input: Text + Image.
* Challenge: Aligning mismatched modalities (e.g., peaceful image vs. violent interaction) and handling "dog whistles."
* Metric: Macro-F1 Score.
Task 2: Textual Virality and Interaction Prediction (Regression)
* Goal: Predict distinct Likability (agreement) and Interactivity (controversy/engagement) scores.
* Input: Text only.
* Challenge: Distinguishing between content that is "liked" versus content that provokes "debate."
* Metrics: Pearson Correlation (r) and MSE.
Data
* Sources: X (Twitter) and Reddit.
* Size: ~5,000 anonymized samples (post-Oct 2023).
* Content: Posts related to "Gaza," "Nakba," "Palestine," "Israel," etc.
Important Dates
* Jan 1: Release of Training Data (3,500 samples)
* Feb 1: Release of Development Data (500 samples)
* Feb 15: Evaluation Period Begins
* Feb 20: Evaluation Period Ends
* Mar 1: Paper Submission Deadline
Participation & Submission
* Participation is free.
* Teams must verify their results by submitting a system description paper (max 4 pages).
* Papers will be published in the proceedings (ACLAnthology).
* Detailed guidelines: https://ezzini.github.io/NakbaVirality/guidelines
Organizers
* Saad Ezzini, King Fahd University of Petroleum and Minerals
* Salima Lamsiyah, University of Luxembourg
* Shadi Abudalfa, King Fahd University of Petroleum & Minerals
* Samir El-Amrany, University of Luxembourg
* Walid Alsafadi, University College of Applied Sciences Gaza
For more information, please visit our website https://ezzini.github.io/NakbaVirality/ or contact us at saad.ezzini(a)kfupm.edu.sa<mailto:saad.ezzini@kfupm.edu.sa>
**********************************************************************
DISCLAIMER: The information in this email and its attachments (if any) is intended for the addressee only and may contain confidential or privileged information. If you are not the intended recipient, please delete the email and its attachments from your system and notify the sender immediately. You should not retain, disclose, copy, or use this email or any of its contents for any purpose, nor disclose its contents to any other person. KFUPM is not responsible for changes made to this message after it was sent. Statements and opinions expressed in this e-mail are those of the sender, and do not necessarily reflect those of KFUPM. KFUPM is not liable for any effect or virus damage caused by this message.
إن المعلومات الواردة في هذا البريد الإلكتروني ومرفقاته إن وجدت، قد تكون خاصة أو سرية؛ فإذا لم تكن المقصود بهذه الرسالة؛ فيُرجى منك حذفها ومرفقاتها من نظامك وإخطار المرسل بخطأ وصولها إليك فورا. كما لا يجوز نسخ أي جزء منها أو مرفقاتها ، أو الإفصاح عن محتوياتها لأي شخص أو استعمالها لأي غرض آخر. إن جامعة الملك فهد للبترول والمعادن لا تتحمل مسؤولية التغييرات التي يتم إجراؤها على هذه الرسالة بعد إرسالها. وإن البيانات أو الآراء المعبر عنها في هذا البريد، هي بيانات تخص مُرسلها، ولا تعكس بالضرورة رأي وبيانات الجامعة. كما لا تتحمل الجامعة مسؤولية أي تأثير ينتج عن هذه الرسالة أوعن أي فيروس قد تحمله.
Knowledge Graphs and Large Language Models (KG–LLM 2026) @ LREC 2026
We are pleased to announce the Workshop on Knowledge Graphs and Large Language Models (KG–LLM 2026), to be held in conjunction with LREC 2026 in Palma de Mallorca, Spain, May 16th 2026.
We invite submissions of original research that leverages both Knowledge Graphs (KGs) and Large Language Models (LLMs) in any domain of Natural Language Processing or language resource development.
More information at https://kg-llm.github.io/
Workshop Overview
Large Language Models have become foundational in NLP, yet they continue to face challenges related to bias, hallucination, explainability, environmental impact, and the cost of training. Knowledge Graphs, in contrast, provide high-quality, interpretable, and reusable ontological and linguistic structures that support reasoning, fact checking, and knowledge preservation.
The goal of this workshop is to bring together researchers working at the intersection of these two paradigms, exploring how explicit knowledge and implicit statistical learning can enhance each other. We welcome contributions that investigate, demonstrate, or evaluate systems, methods, or resources integrating both KGs and LLMs.
Topics of Interest
We encourage submissions on (but not limited to):
1. LLMs for Knowledge Graph Engineering
KG modelling, resource creation, and interlinking
Relation extraction
Corpus annotation
Ontology localization
Creation or expansion of linguistic or knowledge graphs
KG querying and question answering
2. Knowledge Graphs for Large Language Models
Using linguistic or knowledge graphs as training data
Fine-tuning LLMs using linked linguistic (meta)data
Knowledge/linguistic graph embeddings
KGs for model explainability, provenance, and source attribution
Neural models for under-resourced languages
KG-augmented RAG (KG-RAG)
3. Joint Use of KGs and LLMs in Applications
Combined KG–LLM use cases with structured linguistic data
Digital humanities applications
Question answering over graph data
Fake news and misinformation detection
Educational applications and assisted learning
Visualizing academic writing with KGs and LLMs
KG-enhanced chatbots for health and medical contexts
Application Domains
All application domains are welcome (Digital Humanities, FinTech, Linguistics, Education, Cybersecurity, etc.) as long as the work uses both Knowledge Graphs and Large Language Models.
Submission Guidelines
Submission Format: Papers up to 8 pages excluding references.
Style: All submissions must follow the LREC 2026 format and use the official LREC author kit. (available at https://lrec2026.info/authors-kit/ )
Review Process: Double-blind peer review. Submissions must be fully anonymized.
Submission System: Papers must be submitted via the START conference system at https://softconf.com/lrec2026/KGLLM/
Language Resources: In line with LREC policies, authors are encouraged to describe, document, and share language resources, datasets, models, evaluation tools, or annotation guidelines used or created in their work.
Accepted Papers: All accepted papers will be included in the LREC 2026 workshop proceedings.
Presentation: Accepted papers will be presented as oral or poster sessions during the workshop.
Important Dates
*All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”)*
Paper submission deadline: 26 February 2026
Notification to authors: 24 March 2026
Camera-ready due: 30 March 2026
Workshop date: 16 May 2026
Contact
For questions, please contact the workshop organizers at: kg-llm-26(a)googlegroups.com
Organizing Committee
Gilles Sérasset, Université Grenoble Alpes, France
Katerina Gkirtzou, Athena Research Center, Greece
Michael Cochez, Ellis Institute Finland & Åbo Akademi, Finland
Jan-Christoph Kalo, University of Amsterdam, Netherlands
Dear colleagues,
Please find below information about the upcoming GEM workshop!
Event Type: Call for Papers
Conference: GEM at ACL 2026
Date: July 2nd or July 3rd, 2026
Location: San Diego, California, USA
Website: https://gem-workshop.com/
Contact: gem-workshop-chairs(a)googlegroups.com
------------------------------
Overview
The fifth edition of the Natural Language Generation, Evaluation, and
Metrics (GEM) Workshop will be at ACL 2026 in San Diego!
Evaluation of language models has grown to be a central theme in NLP
research, while remaining far from solved. As LMs have become more
powerful, errors have become tougher to spot and systems harder to
distinguish. Evaluation practices are evolving rapidly—from living
benchmarks like Chatbot Arena to LMs being used as evaluators themselves
(e.g., LM as judge, autoraters). Further research is needed to understand
the interplay between metrics, benchmarks, and human-in-the-loop
evaluation, and their impact in real-world settings
Topics of Interest
We welcome submissions related to, but not limited to, the following topics:
-
Automatic evaluation of generation systems, including the use of LMs as
evaluators
-
Creating evaluation corpora, challenge sets, and living benchmarks
-
Critiques of benchmarking efforts, including contamination,
memorization, and validity
-
Evaluation of cutting-edge topics in LM development, including
long-context understanding, agentic capabilities, reasoning, and more
-
Evaluation as measurement beyond raw capability, including ideas such as
robustness, reliability, and more
-
Multimodal evaluation across text, vision, and other modalities
-
Cost-aware and efficient evaluation methods applicable across languages
and scenarios
-
Human evaluation and its role in the era of powerful LMs
-
Evaluation of sociotechnical systems employing large language models
-
Surveys and meta-assessments of evaluation methods, metrics, and
benchmarks
-
Best practices for dataset and benchmark documentation
-
Industry applications of the above-mentioned topics, especially internal
benchmarking or navigating the gap between academic metrics and real-world
impact.
Special TracksOpinion and Statement Papers Track (New!)
We are introducing a special track for opinion and statement papers. These
submissions will be presented in curated panel discussions, encouraging
open dialogue on emerging topics in evaluation research.
We welcome bold, thought-provoking position papers that challenge
conventional wisdom, propose new directions for the field, or offer
critical perspectives on current evaluation practices. This track is an
opportunity to spark discussion and debate—submissions need not present new
empirical results but should offer well-argued viewpoints supported by
scientific evidence (e.g. prior studies) that advance our collective
thinking about evaluation.
ReproNLP
The ReproNLP Shared Task on Reproducibility of Evaluations in NLP has been
run for six consecutive years (2021–2026). ReproNLP 2026 will be part of
the GEM Workshop at ACL 2026 in San Diego. It aims to (i) shed light on
the extent to which past NLP evaluations have been reproducible, and (ii)
draw conclusions regarding how NLP evaluations can be designed and reported
in order to increase reproducibility. Participants submit reports for their
reproductions of human evaluations from previous NLP literature where they
quantitatively assess the degree of reproducibility using methods described
in Belz. (2025). More details can be found in the first call for
participation for ReproNLP 2026 at https://repronlp.github.io.
Workshop Format
We aim to organize the workshop in an inclusive, highly interactive, and
discussion-driven format. Paper presentations will focus on themed poster
sessions that allow presenters to interact with researchers from varied
backgrounds and similar interests. The workshop will feature panels on
emerging topics and multiple short keynotes by leading experts.
🎭 GEM Comic-Con Edition!
In the spirit of San Diego's famous Comic-Con (July 23-26), this year's GEM
will be a special Comic-Con edition! We encourage participants to embrace
creativity! Whether that’s through themed poster designs, comic-style
slides, or dressing up as your favorite evaluation metric personified, we
want this year's workshop to be memorable and fun!
Submission Types
Submissions can take any of the following forms:
-
Archival Papers: Original and unpublished work, for all the following
tracks—Main, ReproNLP, and Opinion/Statement.
-
Non-Archival Extended Abstracts: Work already presented or under review
at a peer-reviewed venue. This is an excellent opportunity to share recent
or ongoing work with the GEM community without precluding future
publication.
-
Findings Papers: We additionally welcome presentation of relevant papers
accepted to Findings, and will share more information at a later date.
All accepted papers will be given up to an additional page to address
reviewers comments.
Submission Guidelines
-
Papers to be reviewed should be submitted directly through OpenReview,
selecting the appropriate track, and conform to ACL 2026 style guidelines
-
Review requirement: For each submitted paper, authors may be asked to
provide 2 reviews (either one author doing 2 reviews, or two authors each
doing one review)
-
Length.
-
Archival papers should be within 4–8 pages, and opinion/statement
papers should be within 2–4 pages. We make no “Short” or “Long” paper
distinctions; we advise authors to tailor their submission length
proportional to their contribution.
-
Extended abstracts should be within 1–2 pages.
-
Opinion/Statement Papers: These should be titled with the “Position:”
prefix.
-
Dual submission: Dual submission of archival papers is not allowed.
Authors interested in presenting work submitted to a different venue should
instead use the non-archival extended abstract track.
Important Dates
-
March 19, 2026: Direct paper submission deadline
-
April 9, 2026: Pre-reviewed ARR commitment deadline
-
April 28, 2026: Notification of acceptance
-
May 14, 2026: Camera-ready paper due
-
June 4, 2026: Pre-recorded video due (hard deadline)
-
July 2–3, 2026: Workshop at ACL in San Diego
Contact
For any questions, please check the workshop page or email the organisers:
gem-workshop-chairs(a)googlegroups.com
*Dublin City University*
*Simon Mille *| Postdoctoral Research Fellow
ADAPT Centre
School of Computing
Dublin City University
Dublin 9
Ireland
www.adaptcentre.ie
*ADAPT Taighde Éireann – Research Ireland Centre for Digital Content
Technology*
Privileged/confidential information: This e-mail and any files transmitted
with it are confidential and are intended solely for use by the addressee.
Please note that electronic mail to, from or within the College, may be the
subject of a request under the Freedom of Information Act.
<https://adaptcentre.us8.list-manage.com/subscribe?u=1bed8296322695727ba3517…>
Dear colleagues,
The December edition of the CLARIN Newsflash is out. Highlights include:
* A review of 2025 with key achievements and milestones
* Direct links to the Annual Conference recordings and an impression video, perfect inspiration if you are considering to submit your abstracts for CLARIN2026
* An overview of CLARIN’s presence at LREC including co-organised workshops
Additionally, the call for extended abstracts for CLARIN 2026 is now open. Submission deadline 6 April 2026. Link to the call: https://www.clarin.eu/content/call-extended-abstracts-clarin-annual-confere…
Read the newsletter: https://www.clarin.eu/content/clarin-newsflash-december-2025
Subscribe to the newsletter here<http://eepurl.com/bOt3Qn>.
Kind regards,
CLARIN ERIC
---
Elisa Gorgaini
Communication Officer - CLARIN ERIC
Utrecht University | Drift 10, 3512 BS Utrecht, The Netherlands
e.gorgaini(a)uu.nl<mailto:e.gorgaini@uu.nl> | elisa(a)clarin.eu<mailto:elisa@clarin.eu>
www.clarin.eu<https://www.clarin.eu>
We invite submissions of shared task proposals for ArgMining 2026, the 13th Workshop on Argument Mining and Reasoning, co-located with ACL 2026 (San Diego).
Background
Argument mining (also known as argumentation mining) is a well-established area in computational linguistics focusing on the automatic identification of argumentative structures such as premises, conclusions, and inference schemes. The field has historically emphasized the development of large-scale datasets and tasks including argument quality assessment, argument persuasiveness, and argumentative text synthesis across domains such as legal, social, medical, political, and scientific settings. In line with broader advances in CL and NLP, recent work has expanded toward explainable argumentation, multimodal settings, and modeling human label variation.
Previous editions of ArgMining have promoted shared tasks to advance research on specific aspects of argument mining, including:
* Multimodal argumentative fallacy detection: https://nlp-unibo.github.io/mm-argfallacy/2025/
Dialogical argument mining: http://dialam.arg.tech/
ArgMining 2026 Shared Tasks
Following the success of prior workshops, ArgMining 2026 plans to feature one or more shared tasks addressing unsolved problems for the community to investigate. In keeping with this year’s special theme—“Understanding and evaluating arguments in both human and machine reasoning”—we particularly encourage proposals aligned with this focus.
What to Include in a Proposal
Shared task proposals should include:
* Title and brief task description
* Description of the datasets to be used and their readiness
* Previous work on the datasets, including relevant publications (if any)
* A short description of the evaluation methodology for submitted systems
* Brief introduction of the task organizers
* Anticipated timeline, including dates for dataset releases and final evaluation
How to Submit
Submit your shared task proposal via email to: argmining.org [at] gmail.com
* Submission deadline: December 22, 2025
* Notification of acceptance: beginning/mid January 2026
Tentative Shared Task Schedule
* Mid January: Training data release
* Early March: Test data release; evaluation start
* Mid/late March: Evaluation end
* Early April: Results announcement
* Mid April: Paper submission deadline
* Mid May: Camera-ready deadline
* July: ArgMining 2026 workshop (at ACL)
Organizers
Mohamed Elaraby (University of Pittsburgh)
Annette Hautli-Janisz (University of Passau)
John Lawrence (University of Dundee)
Elena Musi (University of Liverpool)
Julia Romberg (GESIS)
Federico Ruggeri (University of Bologna)
We’re excited to invite you to take part in ARHAHA 2026, a shared task on Arabic Humor Generation, hosted at OSACT7 and co-located with LREC 2026.
Description:
Participants will develop systems that generate original, safe, and culturally appropriate humorous content in Arabic under a set of carefully designed constraints. The task aims to push models beyond memorization and towards genuine humorous creativity.
Humor Generation Task
This task focuses on building and evaluating systems that generate short humorous texts in Arabic given constrained prompts.
Task Summary
Input: A pair of Arabic words
Output: A short humorous Arabic text (maximum 100 characters)
Evaluation:
Automated format and constraint validation
Human evaluation of humor quality, originality, fluency, and cultural appropriateness
The website for the shared task is:
https://sites.google.com/view/arhaha2026/home
How to Participate?
Registration is required, please complete the registration form.
Join the ARAHAHA at Slack workspace.
System Description Papers
All participating teams are encouraged to submit a short system description paper. Papers will be included in the workshop proceedings and do not require high leaderboard ranking. We welcome creative approaches, analysis, and lessons learned.
Contact
For questions or clarifications, please contact the organizing team at arhaha2026(a)gmail.com
We look forward to your participation and contributions!
Best regards,
The ARHAHA 2026 Organizing Team
________________________________
Disclaimer:
This communication is intended for the above named person and is confidential and / or legally privileged. Any opinion(s) expressed in this communication are not necessarily those of KSU (King Saud University). If it has come to you in error you must take no action based upon it, nor must you print it, copy it, forward it, or show it to anyone. Please delete and destroy the e-mail and any attachments and inform the sender immediately. Thank you.
KSU is not responsible for the political, religious, racial or partisan opinion in any correspondence conducted by its domain users. Therefore, any such opinion expressed, whether explicitly or implicitly, in any said correspondence is not to be interpreted as that of KSU.
KSU may monitor all incoming and outgoing e-mails in line with KSU business practice. Although KSU has taken steps to ensure that e-mails and attachments are free from any virus, we advise that, in keeping with best business practice, the recipient must ensure they are actually virus free.
Touché @ CLEF 2026: Shared Tasks on Argumentation Systems (Fallacies, Causality, Generalizability, Advertisements)
Call for Participation
We invite you to participate in the following shared tasks at Touché 2026 held in conjunction with the CLEF conference.
1. Fallacy Detection.
Given an argument, determine whether it is fallacious and what type of fallacy it is.
https://touche.webis.de/clef26/touche26-web/fallacy-detection.html
2. Causality Extraction.
Given a text, determine whether it contains causal information, identify the information in the text, and classify the expressed relationship.
https://touche.webis.de/clef26/touche26-web/causality-extraction.html
3. Generalizability of Argument Identification in Context.
Given a sentence from some argumentation dataset, determine whether the sentence was annotated as argument (using annotator guidelines etc.).
https://touche.webis.de/clef26/touche26-web/generalizable-argument-mining.h…
4. Advertisement in Retrieval-Augmented Generation (RAG).
Given a query and response of an RAG system, determine whether the response contains an ad, identify the ad in the response, and remove the ad.
https://touche.webis.de/clef25/touche25-web/advertisement-detection.html
Find out more at https://touche.webis.de/clef26/touche26-web/
and join our mailing list at https://groups.google.com/g/touche-lab for staying up to date.
Awards
--------------------------
The best submission for each task will receive an award. In addition, our partners at Methods Hub (https://methodshub.gesis.org/) have agreed to provide priority support to the award-winning teams in developing their submissions into fully reusable software packages to maximize your impact.
Important Dates
--------------------------
2026-05-07: Approaches submission deadline
2026-05-28: Participant paper submission
2026-06-30: Peer review notification
2026-07-06: Camera-ready participant papers submission
2026-09 21-24: CLEF Conference in Jena and Touché Workshop
Links
--------------------------
Touché: https://touche.webis.de
Contact: touche(a)webis.de
We are looking forward to your submission!
The Touché team
Dear all,
This is the last CfP for VarDial 2026 - The Thirteenth Workshop on NLP for Similar Languages, Varieties and Dialects. We have extended the submission deadlines (January 2 for direct submissions, January 10 for committing pre-reviewed submissions), see details below. Apologies for cross-posting!
--
VarDial 2026: https://sites.google.com/view/vardial-2026/
VarDial 2026 will be colocated with EACL 2026 in Rabat, Morocco. We anticipate a discussion on computational methods and language resources for closely related languages, language varieties, and dialects.
We welcome papers dealing with one or more of the following topics:
- Language resources and tools for similar languages, varieties and dialects;
- Evaluation of language resources and tools applied to non-dominant language varieties;
- Cross-lingual transfer and adaptation of models to similar languages, varieties and dialects;
- Automatic identification of lexical variation;
- Automatic classification of language varieties;
- Machine translation between closely-related languages, language varieties and dialects;
- Corpus-driven studies in dialectology and language variation;
- Computational approaches to mutual intelligibility between dialects and similar languages;
- Text similarity and adaptation between language varieties;
- Linguistic issues in the adaptation of language resources and tools (e.g., cognate detection, semantic discrepancies, lexical gaps, false friends);
- Studies focusing on related creole languages and their lexifier languages;
- Studies focusing on diachronic language variation (e.g. phylogenetic methods, historical dialects).
In addition to the topics listed above, we also welcome papers dealing with diachronic language variation (e.g. phylogenetic methods, historical dialects).
Instructions for Authors
Submissions should be formatted according to the ACL Rolling Review template and submitted as a PDF. The review process will be double-blind. More information is on the website (https://sites.google.com/view/vardial-2026/).
Important Dates
- Direct Submission deadline: January 2, 2025 (updated!)
- Pre-reviewed (ARR) submission deadline: January 10, 2026 (updated!)
- Notification of acceptance: January 23, 2026
- Camera-ready paper due: February 3, 2026
- Workshop at EACL (hybrid): March 24-29, 2026 (exact date TBD)
Shared Task: Arabic Modeling In Your Accent (AMIYA)
VarDial 2026 will have a shared task on language modelling for dialectal Arabic (DA), where participants can contribute LLMs trained or adapted for DA. These will be evaluated using the AL-QASIDA benchmark (Robinson et al., 2025), an evaluation suite that comprehensively measures an LLM’s dialectal fidelity, understanding, generation quality, and MSA-DA diglossia in DA. More information: https://sites.google.com/view/vardial-2026/shared-tasks
- Training data release: November 30, 2025
- Registration deadline, eval data finalized: December 15, 2025
- System submission deadline: January 10, 2025
- System description paper deadline: January 20, 2025
Workshop Organizers
Yves Scherrer – University of Oslo (Norway)
Noëmi Aepli – University of Pennsylvania (USA)
Verena Blaschke – LMU Munich and Munich Center for Machine Learning (Germany)
Tommi Jauhiainen – University of Helsinki (Finland)
Nikola Ljubešić – Jožef Stefan Institute and University of Ljubljana (Slovenia)
Preslav Nakov – Mohamed bin Zayed University of Artificial Intelligence (UAE)
Jörg Tiedemann – University of Helsinki (Finland)
Marcos Zampieri – George Mason University (USA)
Contact: yves.scherrer(a)ifi.uio.no or verena.blaschke(a)cis.lmu.de
In this newsletter:
LDC 2026 membership discounts now available
LDC's 1000th corpus
Approaching deadline for Spring 2026 data scholarship applications
LDC closed for Winter Break December 25 - January 2
New publications:
2021 NIST Speaker Recognition Evaluation Development and Test Set<https://catalog.ldc.upenn.edu/LDC2025S11>
LORELEI Sinhala Incident Language Pack<https://catalog.ldc.upenn.edu/LDC2025T17>
________________________________
LDC 2026 membership discounts now available
Now through March 2, 2026, any organization that joins the Consortium or renews their membership will receive a 10% discount off the 2026 membership fee. Membership remains the most economical way to access current and past LDC releases. Consult Join LDC<https://www.ldc.upenn.edu/members/join-ldc> for details on membership options and benefits.
LDC's 1,000th corpus
LDC is delighted to announce the release of the 1,000th corpus into the Catalog! This milestone represents the commitment we made over thirty years ago to provide large quantities of diverse data, robust research program support, and exceptional member services. We are grateful for the continued support and collaboration of our members, friends, and the community.
Approaching deadline for Spring 2026 data scholarship applications
Attention students: don't miss out on the chance to receive no-cost access to LDC data for your research. Applications for Spring 2026 data scholarships are due January 15, 2026. For more information on requirements and program rules, see LDC Data Scholarships<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.
LDC closed for Winter Break December 25-January 2
LDC will be closed from Thursday, December 25, 2025, through Friday, January 2, 2026, in accordance with the University of Pennsylvania Winter Break Policy. Our offices will reopen on Monday, January 5, 2026. Requests received by the Membership Office during Winter Break will be processed when the office reopens.
________________________________
New publications:
2021 NIST Speaker Recognition Evaluation Test Set<https://catalog.ldc.upenn.edu/LDC2025S11> was developed by LDC and NIST (National Institute of Standards and Technology). It contains approximately 447 hours of Cantonese, Mandarin, and English conversational telephone speech, audio from video, and selfie image data for development and test, along with answer keys, enrollment, trial files, and documentation from the NIST-sponsored 2021 Speaker Recognition Evaluation (SRE)<https://www.nist.gov/itl/iad/mig/nist-2021-speaker-recognition-evaluation-s…>.
The SRE task is speaker detection, that is, to determine whether a specified target speaker was speaking during a segment of speech. SRE21 focused on telephone speech and audio from video and included close-up images of participants. The evaluation also featured cross-lingual trials, that is, enrollment and test segments spoken in different languages.
The data was drawn from the WeCanTalk corpus collected by LDC in which speakers called friends or relatives who agreed to record their telephone conversations lasting between 8-10 minutes. Subjects contributed multiple conversational telephone speech recordings and audio recordings in which they were talking, plus a single selfie image. Recordings were manually audited to verify speaker, language, and quality.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Sinhala Incident Language Pack<https://catalog.ldc.upenn.edu/LDC2025T17> was developed by LDC and is comprised of 8.1 million words of Sinhala monolingual text, 700,00 words of English monolingual text, 6.4 million words of parallel Sinhala- English text, and 50,000 words annotated for entity discovery and linking and situation frames. It constitutes all of the text data, annotations, supplemental resources, and related software tools for the Sinhala language used in the DARPA LORELEI / LoReHLT 2018 Evaluation<https://www.nist.gov/itl/iad/mig/lorehlt-evaluations>.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. In the evaluation scenario, an unforeseen event triggered a need for humanitarian and logistical support in a region where the incident language had received little or no attention in NLP research. Evaluation participants provided NLP solutions, including information extraction and machine translation, with limited resources and limited development time.
Data was collected from news, social network, weblog, newsgroup, discussion forum, and reference material. Entity discovery and linking annotation identified entities to be detected by systems for scoring purposes. Situation frame analysis was designed to extract basic information about needs and relevant issues for planning a disaster response effort.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104