March 2026 - Corpora

2nd CFP - Atelier TAL@Santé @ CORIA-TALN 2026
by Aman Sinha 27 Apr '26

27 Apr '26

* Atelier TAL@Santé 2026 * @ CORIA-TALN 2026 -- 29 juin 2026, Nantes Site Internet : [ https://atelier-tal-sante.github.io/ | https://atelier-tal-sante.github.io/ ] APPEL A COMMUNICATION Dans le cadre des conférences conjointes CORIA-TALN 2026, l'atelier TAL@Santé 2026 vise à fédérer la communauté francophone du Traitement Automatique des Langues (TAL) appliqué à la santé. Il ambitionne de croiser outils, méthodes, ressources, retours d’expérience et perspectives autour des textes cliniques et biomédicaux. DATES IMPORTANTES * Soumission des articles : 29 avril 2026 * Notification aux auteurs : 13 mai 2026 * Version finale : 21 mai 2026 * Atelier : 29 juin 2026, de 9h00 à 17h30 TYPES ET FORMAT DES SOUMISSIONS Les types d’articles acceptés sont : * Articles résumés (3 pages max + références) : - Travaux préliminaires en cours - Description d'un projet de recherche - Traduction d’un article récemment accepté (ou en cours de soumission) dans une conférence internationale * Articles classiques (entre 6 et 10 pages + références) : - Contribution nouvelle - Etat de l’art - Résultat négatif apportant une perspective nouvelle à un problème scientifique - Prise de position présentant un point de vue sur l’état des recherches en TAL et santé Les articles acceptés seront présentés au cours de la journée sous la forme d’une présentation orale ou d’un poster. Ils seront également publiés dans les actes de la conférence CORIA-TALN 2026. La langue officielle de la conférence est le français. Si tous les auteurs sont francophones, les articles doivent être rédigés en français. Si l’un des auteurs n’est pas francophone, les articles peuvent être rédigés en anglais. Site de soumission : [ https://openreview.net/group?id=ls2n.fr/CORIA-TALN/2026/Workshop/TAL-Sante | https://openreview.net/group?id=ls2n.fr/CORIA-TALN/2026/Workshop/TAL-Sante ] THEMATIQUES DE L’ATELIER Les sujets d’intérêt incluent, mais ne sont pas limités à : - Extraction d’entités, de relations et d’événements complexes - Extraction d’information et classification de textes cliniques ou biomédicaux - Accessibilité : simplification de textes médicaux, littératie en santé, communication patient–soignant - Mésinformation et qualité de l’information en santé - Détection et atténuation des biais - Enjeux éthiques du TAL pour la santé - Approches frugales - Cadres d’évaluation, reproductibilité et métriques orientées vers l’usage - Approches génératives : factualité, traçabilité (RAG, citations, vérification), détection d’hallucinations - Schémas d’annotation et méthodologies de construction de ressources annotées - Annotation assistée par LLM - Création de modèles de langue spécialisés, adaptation de domaine, apprentissage par transfert, apprentissage fédéré, apprentissage faiblement supervisé - Analyse automatique de la littérature scientifique pour la santé CONTACT : atelier-tal-sante(a)univ-nantes.fr ORGANISATEURS Richard Dufour (LS2N, Nantes Université) Yanis Labrak (IDIAP) Emmanuel Morin (LS2N, Nantes Université) Aurélie Névéol (LISN, Université Paris-Saclay) Aman Sinha (ATILF, Université de Lorraine) Laura Zanella (Doctolib) Pierre Zweigenbaum (LISN, Université Paris-Saclay)

1 1

How Are Researchers Using LLMs? Participate in Our Research Survey
by Ansgar Scherp 21 Apr '26

21 Apr '26

We invite you to participate in our survey “Investigating the Use of Large Language Models in Academic Research and Coding (LLM-ARCo)”🤖📚 The goal of this study is to better understand how researchers use large language models in their academic work. The survey takes approximately 15 minutes⏱️. All responses are anonymous🔒, and the data will be used only for research purposes. As a small thank-you, 10 participants will be randomly selected🎁 to receive a €50 Amazon voucher💶. You can participate here:👉https://www.surveymonkey.com/r/RYBRXQL <https://www.surveymonkey.com/r/RYBRXQL> Please feel free to share this invitation with colleagueswho might be interested 🔁👩‍🔬👨‍💻.Thank you very much for supporting our research! ❤️ On behalf of Dr. Younes, Ansgar

1 1

CFP: Language Technologies and Digital Humanities: Resources and Applications (LTаDH-RA)
by Kiril Simov 18 Apr '26

18 Apr '26

Language Technologies and Digital Humanities: Resources and Applications (LTаDH-RA) CLaDA-BG 2026 Conference Sofia, Bulgaria Venue: tba 25-26 June 2026 CLaDA-BG is the Bulgarian national research infrastructure for resources and technologies for linguistic, cultural and historical heritage, integrated within CLARIN EU and DARIAH EU. Its mission is to provide access to the necessary resources and technologies that would support the research in Social Sciences and Humanities (SS&H). Modeling and linking of various types of knowledge and its contexts is crucial for the successful research in the interdisciplinary field of resources and technologies related to language, culture and history. This is the fifth edition of the CLaDA-BG conference. It aims at bringing together NLP developers, linguists, digital humanitarians, scholars and all parties interested in knowledge modeling and linking data for research. Topics of Interest The topics include, but are not limited to, the following ones: • Problems in SS&H – research methods, technological support, applications • Language technologies for sentiment analysis, semantic technologies, trust-worthiness of knowledge graphs, ethical challenges in digital SS&H • Knowledge Modeling and Elicitation for digital SS&H • Specific Language Resources and Technologies for historical texts, parliamentary records, speech and multimodal corpora, social media data, etc. • The role of digital libraries, archives and museums in digital SS&H research • Language Interface to Knowledge Graphs in SS&H • Knowledge-modeled and linked applications in SS&H • Large Language Models for DH • Best practices and new trends in Knowledge Modeling and Linking for language, culture and history Invited Speakers The invited speakers will be announced soon Important Dates Submission deadline: 19.04.2026 Notification of acceptance: 24.05.2026 Final Submission: 20.06.2026 Conference: 25-26.06.2026 Submissions We welcome oral presentations or posters (optionally with demo). We conform to CEUR-WS.org proceedings but the proceedings will not be published there. The instructions for preparing the submissions are here: https://ceur-ws.org/HOWTOSUBMIT.html#CEURART We invite two types of papers: regular papers (between 10 and 12 "standard" pages) and short papers (5-9 "standard" pages) in accordance with CEURART, 2-column style. A "standard" is 2500 characters. We also accept extended abstract submissions (3-5 "standard" pages) in accordance with CEURART, 2-column style. They will be presented at the conference and will be published in a Book of Abstracts in electronic form. Please submit your full paper or extended abstract in PDF to following email: ltadh-ra(a)bultreebank.org For contacting organizers, please use the following email: ltadh-ra(a)bultreebank.org The CLaDA-BG Organizers

1 2

Digital lexicography and lexical computing workshop, Bari, Italy
by Ondřej Matuška 16 Apr '26

16 Apr '26

*<Lexicom/>* a workshop in digital lexicography and lexical computing *Registration open* *Bari, Italy*15 – 19 September 2025 Your 5 days to get up-to-date with the latest developments in *corpus-driven lexicography* and to practice your *corpus building and corpus query skills* with some of the top experts in the field. For the programme, lecturers, invited speakers, fees and registration, visit this website *lexicom.courses <https://lexicom.courses/upcoming-lexicom/>* I hope to meet you in Bari in September! Ondřej *Ondřej Matuška* sketchengine.eu <http://www.sketchengine.eu/> | Facebook <https://www.facebook.com/SketchEngine/> | LinkedIn <https://www.linkedin.com/in/ondrejmatuska> | Twitter <https://twitter.com/SketchEngine>

1 2

CFP: International Conference ‘New Trends in Translation and Interpreting Technology’ (NeTTIT’2026)
by Amal Haddad 15 Apr '26

15 Apr '26

International Conference 'New Trends in Translation and Interpreting Technology' (NeTTIT'2026) Dubrovnik, Croatia, 24-27 June 2026 https://nettt-conference.com Extended Deadline Call for Papers *** Extended submission deadline 27 April 2026 *** # The conference The third edition of the International Conference 'New Trends in Translation and Interpreting Technology' (NeTTIT'2026) will take place in Dubrovnik, Croatia from 24 to 27 June 2026. The objective of the conference is (i) to bridge the gap between academia and industry in the field of translation and interpreting by bringing together academics in linguistics, translation and interpreting studies, machine translation and natural language processing, developers, practitioners, language service providers and vendors who work on or are interested in different aspects of technology for translation and interpreting, and (ii) to be a distinctive event for discussing the latest developments and practices. NeTTIT'2026 invites all professionals who would like to learn about the new trends, present the latest work or/and share their experience in the field, and who would like to establish business and research contacts, collaborations and new ventures. The conference will include plenary presentations (research and user presentations, keynote speeches), poster sessions and panel discussions. All submitted papers will be peer-reviewed by experts, and the accepted papers will be published as open-access conference e- proceedings which will be available at the time of the conference. # Conference topics Contributions are invited on any topic related to latest technology and practices in translation, subtitling, localisation, interpreting, machine translation and Large Language Models used in translation and interpreting. NeTTIT'2026 will feature a Special Theme Track "Future of Translation and Interpreting Technologies in the Era of LLMs and Generative AI". The conference topics include but are not limited to (see also the special conference theme below): ## CAT tools - Translation Memory (TM) systems - NLP and MT for translation memory systems - Terminology extraction tools - Localisation tools ## Machine Translation - Latest developments in Neural Machine Translation - MT for under-resourced languages - MT with low computing resources - Multimodal MT - Integration of MT in TM systems - Resources for MT ## Technologies for MT deployment - MT evaluation techniques, metrics and evaluation results - Human evaluations of MT output - Evaluating MT in a real-world setting - Quality estimation for MT - Domain adaptation ## Translation Studies - Corpus-based studies applied to translation - Corpora and resources for translation - Translationese - Cognitive effort and eye-tracking experiments in translation ## Interpreting studies - Corpus-based studies applied to interpreting - Corpora and resources for interpreting - Interpretese - Resources for interpreting and interpreting technology applications - Cognitive effort and eye-tracking experiments in interpreting ## Interpreting technology - Machine interpreting - Computer-aided interpreting - NLP for dialogue interpreting - Development of NLP based applications for communication in public service settings (healthcare, education, law, emergency services) ## Emerging Areas in Translation and Interpreting - MT and translation tools for literary texts and creative texts - MT for social media and real-time conversations - Sign language recognition and translation ## Subtitling - NLP and MT for subtitling - Latest technology for subtitling ## User needs - Analysis of translators' and interpreters' needs in terms of translation and interpreting technology - User requirements for interpreting and translation tools - Incorporating human knowledge into translation and interpreting technology - What existing translators' (including subtitlers') and interpreters' tools do not offer - User requirements for electronic resources for translators and interpreters - Translation and interpreting workflows in larger organisations and the tools for translation and interpreting employed ## The business of translation and interpreting - Translation workflow and management - Technology adoption by translators and industry - Setting up translation /interpreting / language provider company ## Teaching translation and interpreting - Teaching Machine Translation - Teaching translation technology - Teaching interpreting technology - Latest AI developments in the syllabi of translation and interpreting curricula ## Ethical issues in translation and technology - Bias and fairness in MT - Privacy and security in cloud MT systems - Transparency and explainability of MT systems - Environmental impact on MT systems # Special Theme Track - Future of Translation and Interpreting Technologies in the Era of LLMs and Generative AI We are excited to share that NeTTIT'2026 will have a special theme with the goal of stimulating discussion around Large Language Models, Generative AI and the Future of Translation and Interpreting Technologies. While the new generation of Large Language Models such as CHATGPT, Gemini, Claude, DeepSeek and LLAMA showcase remarkable advancements in language generation and understanding, we find ourselves in uncharted territory when it comes to their performance on various Translation and Interpreting Technology tasks with regards to fairness, interpretability, ethics and transparency. The theme track invites studies on how LLMs perform on Translation and Interpreting Technology tasks and applications, and what this means for the future of the field. The possible topics of discussion include (but are not limited to) the following: - Changes in (and the impact on) the translators and interpreters' professions in the new AI era especially as a result of the latest developments in LLMs and Generative AI - Generative AI and translation - Generative AI and interpreting - Augmenting machine translation systems with generative AI - Domain and terminology adaptation with Large Language Models - Literary translation with Large Language Models - Translation for low-resourced and minority languages with LLMs - Improving Machine Translation Quality with Contextual Prompts in Large Language Models - Prompt engineering for translation - Generative AI for professional translation - Generative AI for professional interpreting # Invited speakers Yves Champollion, Wordfast LLC Marko Grobelnik, Josef Stefan Institute # Submissions and publication NeTTIT'2026 invites the following types of submissions in English: ## Academic papers - Regular long papers: These can be up to eight (8) pages long, presenting substantial, original, completed, and unpublished work. - Short papers: These can be up to four (4) pages long and are suitable for describing small, focused contributions, work-in-progress, negative results, system demonstrations, etc. ## User papers - for industry and practitioners. References to related work are optional. Allowed paper length: between 2 and 4 pages. Papers should be submitted through Softconf/START using the following link: https://softconf.com/p/nettit2026/user/ For submitting the papers, we invite the authors to comply with the ACL format using the templates available on the conference website. The conference will not consider and evaluate abstracts only. Further details on the submission procedure are available on the conference website: https://nettt-conference.com/2026/submissions-and-publication/ The accepted papers will be published in the conference e-proceedings with assigned ISBN and DOI and made available online on the conference website at the time of the conference. The conference organisers will seek the inclusion of the conference proceedings in the ACL anthology. # Important dates - Extended submissions deadline: 27 April 2026 - Reviewing process: 28 April -18 May 2026 - Notification of acceptance: 20 May 2026 - Camera-ready due: 5 June 2026 - Conference camera-ready proceedings ready 19 June 2026 - Conference: 24-27 June 2026 Papers submitted before the submission deadline will be reviewed on a rolling basis so that authors requiring visas can be notified earlier and have sufficient time to obtain them # Pre-conference Tutorials The pre-conference tutorials will include: Post-editing and AI-augmented translation - Marie Escribe (LanguageWire and Polytechnic University of Valencia) Machine Translation Quality Evaluation - Tharindu Ranasinghe (Lancaster University) Automatic Speech Recognition as a supporting tool for interpreters - Constantin Orasan (University of Surrey) # Conference Chairs - Gloria Corpas Pastor (University of Malaga) - Ruslan Mitkov (Lancaster University and University of Alicante) - Marko Tadic (University of Zagreb) # Programme Committee Chairs - Constantin Orasan (University of Surrey) - Tharindu Ranasinghe (Lancaster University) # Publication Chairs - Marie Escribe (LanguageWire and Polytechnic University of Valencia) - Alicia Picazo Izquierdo (University of Alicante) # Organising Committee and Programme Committee coordination -- Marie Escribe (LanguageWire and Polytechnic University of Valencia) - Alicia Picazo Izquierdo (University of Alicante) - Xiaojing Zhao (Hong Kong Polytechnic University) # Publicity and Sponsorship Chair - Vilelmini Sosoni (Ionian University) # Programme committee For a list of the programme committee members visit: https://nettt-conference.com/2026/programme-committee/ # Venue The conference will take place at the Centre for Advanced Academic Studies (CAAS) of the University of Zagreb (http://www.caas.unizg.hr/) in Dubrovnik. # Sponsor Juremy.com # Sponsorship opportunities Companies working in the fields of translation technology, interpreting technology and/or related fields, are welcome to familiarise themselves with the sponsorship opportunities that the conference offers. Please visit https://nettt-conference.com/2026/sponsors/ for more details. # Further information and contact details The conference website https://nettt-conference.com/ is updated on a regular basis. For further information, please email nettit2026(a)nettt-conference.com. You can also follow us on social media for updates and announcements. LinkedIn - https://www.linkedin.com/company/nettit2026/ Twitter/X - https://x.com/NeTTIT2026 -- Amal Haddad Haddad (She/her) Facultad de Traducción e Interpretación Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad Lexicon Research Group |http://lexicon.ugr.es/haddad Co-Convenor, BAAL SIG 'Humans, Machines, Language'|https://r.jyu.fi/humala Event Coordinator, BAAL SIG 'Language, Learning and Teaching' =============== Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it" ===============

2 2

Call for Track proposals - Forum for Information Retrieval Evaluation 2026 - fire.irsi.org.in
by Thomas Mandl 10 Apr '26

10 Apr '26

Dear Colleagues, Call for Evaluation and Benchmarking Track 2026 is out now. ---------------------------------------------------------------------------------------------------- We invite proposals for the Evaluation and Benchmarking Track at FIRE 2026. FIRE 2026 is the 18th edition of the annual meeting of Forum for Information Retrieval Evaluation (fire.irsi.org.in <https://fire.irsi.org.in>). Since its inception in 2008, FIRE had a strong focus on shared tasks similar to those offered at Evaluation forums like TREC, CLEF, and NTCIR. The shared tasks should focus on solving specific problems in the area of information access and, more importantly, on generating high quality evaluation datasets for the research community. In line with this objective, this year the FIRE tracks emphasize selecting tasks that either introduce a new paradigm in the field or generate a substantial amount of valuable benchmark data that can support future research and experimentation. It is not required for the tasks to focus on a specific language, and they can broadly cover any problem in the fields related (but not limited) to IR, NLP, multi-modal information access, and ML. However, the organizers especially encourage proposals for tracks related to South Asian, African, and Middle Eastern languages. In the past, FIRE has hosted tracks from Arabic, Persian, German, Russian, and Urdu languages besides several Indian languages. We aim to continue these efforts and include more language groups from these regions. For knowing more about tracks in past FIRE meetings, you can visit fire.irsi.org.in <https://fire.irsi.org.in> Informal inquiries can also be sent to the track chairs. Please include the following details in your proposal: 1. Track name 2. Track description 3. Use case/s 4. Target Audience and number of expected submissions 5. Data(*) (Fair Details) 6. Evaluation plan 7. Timeline: Please try to align with the FIRE conference dates as given below 8. Organizer/s Details 9. Prior experience in organizing shared task/workshop at relevant venues *Tentative Timeline* *5th April, 2026* Track proposals due *24th April, 2026* Track acceptance notification *15th May, 2026* Open track websites and release of training data *15th June, 2026* Test data release *30th June, 2026* Run submission deadline *15th July, 2026* Track results declaration *30th August, 2026* Working notes due *30th September, 2026* Camera-ready copies of working notes and overview paper due *December, 2026 - Dates TBD* FIRE 2026 Conference Please send these details in a pdf format to clia(a)isical.ac.in with a copy to majumdar.srijoni(a)gmail.com <mailto:majumbdar.srijoni@gmail.com>, kripa.ghosh(a)gmail.com and mandl(a)uni-hildesheim.de (*) We require that after FIRE, the data should be made publicly available through Information Retrieval Society of India. In case, data can not be distributed publicly (e.g., Twitter data), a unique identifier that can be used to recreate the original corpus can be provided (e.g., tweet ids in case of Twitter data). This disbursal will be governed by a copyright form, which the users have to sign before getting the data. A sample form is available at ( fire.irsi.org.in/fire/static/data <https://fire.irsi.org.in/fire/static/data> ). In case, it is not preferable/possible for the track organizers to share the data, please mention this in the proposal with specific concerns. Exceptions can be made for tracks where data from industry is used or in case of other serious legal or ethical concerns. The aim of organizing these tracks at FIRE is to have debates and discussions on focused topics and give feedback to participants. As a result, at least one of the track organizers from each track is expected to attend FIRE and present the overview of track in person. In case of non-attendance of any of the organizers, the team will not be allowed to offer a track next year. We will try to provide student volunteers for support in the proposed tracks. They will basically be undergraduate students interested in IR and related fields and can help with corpus creation, evaluation, correspondence with participants, etc. If you require any such support, kindly mention that in the track proposals along with the number of students required. Hoping to have an enthusiastic response from your end. *Overall Track Coordinators, FIRE 2026* Thomas Mandl (Universitat Hildesheim, Germany) Kripabandhu Ghosh (IISER Kolkata, India) Srijoni Majumdar (University of Leeds, UK)

1 1

*SEM 2026: Direct Commitment for ARR-Reviewed Papers (Deadline: 10 April 2026)
by Nedjma Ousidhoum 07 Apr '26

07 Apr '26

Dear corpora list members, *SEM 2026 (The 15th Joint Conference on Lexical and Computational Semantics), co-located with ACL 2026, welcomes direct commitments of pre-reviewed papers from ARR. If your paper has already been reviewed through ARR and you would like it to be considered for *SEM 2026, you can submit it through the direct commitment process. Deadline: April 10, 2026 Commitment link: https://openreview.net/group?id=aclweb.org/StarSEM/2026/Conference Important Dates (All deadlines are 11:59 PM UTC-12h, Anywhere on Earth) * Notification of acceptance: May 5, 2026 * Camera-ready deadline: May 26, 2026 * Conference date: July 3, 2026 (co-located with ACL 2026) Following ACL and ARR policies, there is no anonymity period requirement. More information: Website: https://starsem2026.github.io/ Call for Papers: https://starsem2026.github.io/calls/ Blog post: https://starsem2026.github.io/blog/ We look forward to your submissions. Best regards, *SEM Program Chairs.

1 1

2nd Call for SemEval Task Proposals 2027
by Ekaterina Kochmar 06 Apr '26

06 Apr '26

Introduction We invite proposals for tasks to be run as part of SemEval-2027. SemEval (the International Workshop on Semantic Evaluation) is an ongoing series of evaluations of computational semantics systems, organized under the umbrella of SIGLEX, the Special Interest Group on the Lexicon of the Association for Computational Linguistics. SemEval tasks investigate the nature of meaning in natural languages, exploring how to characterize and compute meaning. This is achieved in practical terms, using shared datasets and standardized evaluation metrics to quantify the strengths and weaknesses and possible solutions. SemEval tasks encompass a broad range of semantic topics from the lexical level to the discourse level, including word sense identification, semantic parsing, coreference resolution, and sentiment analysis, among others. For SemEval-2027, we welcome tasks that can test an automatic system for semantic analysis of text (e.g., intrinsic semantic evaluation, or an application-oriented evaluation). We especially encourage tasks for languages other than English, cross-lingual tasks, and tasks that develop novel applications of computational semantics. See the websites of previous editions of SemEval to get an idea about the range of tasks explored, e.g., SemEval-2020 (http://alt.qcri.org/semeval2020/) and SemEval-2021/2026 (https://semeval.github.io<https://semeval.github.io/>). We strongly encourage proposals based on pilot studies that have already generated initial data, evaluation measures, and baselines. In this way, we can avoid unforeseen challenges down the road that may delay the task. We suggest providing a reasonable baseline (e.g., providing a Transformer / LLM baseline for a classification task) apart from the majority vote / random guess. In case you are not sure whether a task is suitable for SemEval, please feel free to get in touch with the SemEval organizers at <semevalorganizers(a)gmail.com<mailto:semevalorganizers@gmail.com>> to discuss your idea. The submission webpage is: https://softconf.com/acl2026/semevaltasks2027/ Task Selection Task proposals will be reviewed by experts, and reviews will serve as the basis for acceptance decisions. Everything else being equal, more innovative new tasks will be given preference over task reruns. Task proposals will be evaluated on: Novelty: Is the task on a compelling new problem that has not been explored much in the community? Is the task a rerun, but covering substantially new ground (new subtasks, new types of data, new languages, etc. - one addition is not sufficient)? Interest: Is the proposed task likely to attract a sufficient number of participants? Data: Are the plans for collecting data convincing? Will the resulting data be of high quality? Will annotations have meaningfully high inter-annotator agreements? Have all appropriate licenses for use and re-use of the data after the evaluation been secured? Have all international privacy concerns been addressed? Will the data annotation be ready on time? Evaluation: Is the methodology for evaluation sound? Is the necessary infrastructure available, or can it be built in time for the shared task? Will research inspired by this task be able to evaluate in the same manner and on the same data after the initial task? Is the task significantly challenging (e.g., room for improvement over the baselines)? Impact: What is the expected impact of the data in this task on future research beyond the SemEval Workshop? Ethical – The data must be compliant with privacy policies. e.g. avoid personally identifiable information (PII). Tasks aimed at identifying specific people will not be accepted. Avoid medical decision making (compliance with HIPAA, do not try to replace medical professionals, especially if it has anything to do with mental health). These are representative and not exhaustive. Roles: Lead Organizer - main point of contact, expected to ensure deliverables are met on time and participate in contributing to task duties (see below). Co-Organizers - provide significant contributions to ensuring the task runs smoothly. Some examples include maintaining communication with task participants, preparing data, creating and running evaluation scripts, leading paper reviewing, and acceptance. Advisory Organizers - more of a supervisor role, may not contribute to detailed tasks, but will provide guidance and support. New Tasks vs. Task Reruns We welcome both new tasks and task reruns. For a new task, the proposal should address whether the task would be able to attract participants. Preference will be given to novel tasks that have not received much attention yet. For reruns of previous shared tasks (whether or not the previous task was part of SemEval), the proposal should address the need for another iteration of the task. Valid reasons include: a new form of evaluation (e.g., a new evaluation metric, a new application-oriented scenario), new genres or domains (e.g., social media, domain-specific corpora), or a significant expansion in scale. We further discourage carrying over a previous task and just adding new subtasks, as this can lead to the accumulation of too many subtasks. Evaluating on a different dataset with the same task formulation, or evaluating on the same dataset with a different evaluation metric, typically should not be considered a separate subtask. Task Organization We welcome people who have never organized a SemEval task before, as well as those who have. Apart from providing a dataset, task organizers are expected to: - Verify the data annotations have sufficient inter-annotator agreement. - Verify licenses for the data allow its use in the competition and afterwards. In particular, text that is publicly available online is not necessarily in the public domain; unless a license has been provided, the author retains all rights associated with their work, including copying, sharing and publishing. For more information, see: https://creativecommons.org/faq/#what-is-copyright-and-why-does-it-matter - Resolve any potential security, privacy, or ethical concerns about the data. - Commit to make the data available also after the task in a long-term repository under an appropriate license, preferably using Zenodo: https://zenodo.org/communities/semeval/ - Provide task participants with format checkers and standard scorers. - Provide task participants with baseline systems to use as a starting point (in order to lower the obstacles to participation). A baseline system typically contains code that reads the data, creates a baseline response (e.g., random guessing, majority class prediction), and outputs the evaluation results. Whenever possible, baseline systems should be written in widely used programming languages and/or should be implemented as a component for standard NLP pipelines. - Create a mailing list and website for the task and post all relevant information there. - Create a CodaLab or other similar competition for the task and upload the evaluation script. - Manage submissions on CodaLab or a similar competition site. - Write a task description paper to be included in SemEval proceedings, and present it at the workshop. - Manage participants’ submissions of system description papers, manage participants’ peer review of each other’s papers, and possibly shepherd papers that need additional help in improving the writing. - Review other task description papers. Desk Rejects - To ensure tasks have sufficient support, we require a minimum of two organizers at the time of proposal submission. A task proposal with only one organizer will be desk-rejected. Running a SemEval task is a significant time commitment; therefore, we highly recommend that a task have at least three-four organizers. - A person can be a lead organizer on only one task. The second mandatory organizer on the task must be committed to the task as a key co-organizer. Any other organizers (beyond the lead and co-organizer) can participate in other tasks. - All data should have a research-friendly license. The licensing must be provided in the proposal. - Task organizers must commit to keeping the data available after the task, either by keeping the task alive, by uploading it to Zenodo or some other public data storage location that will be permanent, and sharing the link with the organizers. === Important dates === - Task proposals due 13 April 2026 (Anywhere on Earth) - Task selection notification 25 May 2026 === Preliminary timetable === - Sample data ready 15 July 2026 - Training data ready 1 September 2026 - Evaluation data ready 1 December 2026 (internal deadline; not for public release) - Evaluation start 10 January 2027 - Evaluation end by 31 January 2027 (latest date; task organizers may choose an earlier date) - Paper submission due February 2027 - Notification to authors March 2027 - Camera ready due April 2027 - SemEval workshop Summer 2027 (co-located with a major NLP conference) Tasks that fail to keep up with crucial deadlines (such as the dates for having the task and CodaLab website up and dates for uploading sample, training, and evaluation data) may be cancelled at the discretion of SemEval organizers. While consideration will be given to extenuating circumstances, our goal is to provide sufficient time for the participants to develop strong and well-thought-out systems. Cancelled tasks will be encouraged to submit proposals for the subsequent year’s SemEval. To reduce the risk of tasks failing to meet the deadlines, we are unlikely to accept multiple tasks with overlap in the task organizers. Submission Details The task proposal should be a self-contained document of no longer than 3 pages (plus additional pages for references). All submissions must be in PDF format, following the ACL template: https://github.com/acl-org/acl-style-files Each proposal should contain the following: - Overview - Summary of the task - Why this task is needed and which communities would be interested in participating - Expected impact of the task - Data & Resources - How the training/testing data will be produced. Please discuss whether existing corpora will be reused. - Details of copyright and license, so that the data can be used by the research community both during the SemEval evaluation and afterwards - How much data will be produced - How data quality will be ensured and evaluated - An example of what the data would look like - Resources required to produce the data and prepare the task for participants (annotation cost, annotation time, computation time, etc.) - Assessment of any concerns with respect to ethics, privacy, or security (e.g., personally identifiable information of private individuals; potential for systems to cause harm) - Pilot Task (strongly recommended) - Details of the pilot task - What lessons were learned, and how these will impact the task design - Evaluation - The evaluation methodology to be used, including clear evaluation criteria - For Task Reruns - Justification for why a new iteration of the task is needed (see criteria above) - What will differ from the previous iteration - Expected impact of the rerun compared with the previous iteration - Task organizers - Names, affiliations, email addresses - (optional) brief description of relevant experience or expertise - (if applicable) years and task numbers of any SemEval tasks you have run in the past Proposals will be reviewed by an independent group of area experts who may not have familiarity with recent SemEval tasks, and therefore, all proposals should be written in a self-explanatory manner and contain sufficient examples. The submission webpage is: https://softconf.com/acl2026/semevaltasks2027/ === Chairs === Debanjan Ghosh, Analog Devices, USA Kai North, Cambium Assessment, USA Ekaterina Kochmar, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE Mamoru Komachi, Hitotsubashi University, Japan Marcos Zampieri, George Mason University, USA Contact: semevalorganizers(a)gmail.com<mailto:semevalorganizers@gmail.com>

1 1

Appel : Journée d'étude « IA et découvrabilité scientifique : enjeux pour la francophonie »
by Laperrière, Gaëlle 04 Apr '26

04 Apr '26

Appel : Journée d'étude « IA et découvrabilité scientifique : enjeux pour la francophonie » Jeudi 30 avril, Montréal, Canada **Date limite de réponse : 4 avril 2026** https://dcsf.cirst.ca/journees-ia-decouvrabilite-appel/ Créée en 2024 par le Fonds de recherche du Québec, la Chaire de recherche du Québec sur la découvrabilité des contenus scientifiques en français (DCSF) s'intéresse aux conditions d'accès, de diffusion et d'usage des savoirs scientifiques en français. Elle étudie les pratiques de publication et les outils technologiques qui influencent la découvrabilité des contenus. Elle développe des solutions pour infléchir le recul de l'usage du français en recherche et pour bonifier les capacités de découverte des principales plateformes de diffusion de contenus scientifiques en français utilisées au Québec. En soutenant des stratégies et des outils adaptés, la Chaire vise ainsi à renforcer durablement la présence du français dans les communautés de recherche. ________________________________ La journée d’étude « IA et découvrabilité scientifique : enjeux pour la francophonie », organisée le 30 avril 2026, propose une réflexion collective sur les effets des outils d’intelligence artificielle dans la circulation des savoirs scientifiques en français. Des tables rondes, des présentations, une session de posters et un atelier pratique permettront d’examiner les enjeux de l’IA pour la recherche francophone : promesses et limites de l’IA générative, biais linguistiques et sociaux des modèles multilingues, enjeux de souveraineté des données et propriété intellectuelle, standardisation du savoir. Cette journée entend ouvrir un espace de débat sur l’avenir de la publication scientifique en français et identifier des leviers d’action pour renforcer la découvrabilité des contenus francophones dans les environnements numériques contemporains. Outre la consolidation d’un réseau de recherche interdisciplinaire, cette journée a pour objectif de favoriser l’identification de pistes de recherche sur les transformations induites par l’IA dans la circulation des savoirs scientifiques en français. Appel à présentations Public visé Personnes utilisatrices (journalisme, communication scientifique, milieux communautaires, etc.), communauté de recherche (IA, communication, sciences de l’information, sciences humaines et sociales, études linguistiques, etc.). Format Communication de 15 minutes + 5 minutes de questions. Participation en mode hybride (en personne ou à distance). Thématiques * Biais linguistiques et sociaux des modèles multilingues * Standardisation du savoir et biais d’indexation * Secteur privé, propriété intellectuelle et souveraineté des données * Place de la francophonie (Afrique, Europe, Québec, etc.) dans les modèles d’IA * Recherche documentaire : les LLM face aux moteurs de recherche * Découvrabilité scientifique et vulgarisation * Publication scientifique en français (enjeux globaux et responsabilités locales) * Les propositions hors de ces thèmes mais en cohérence avec la thématique générale des journées sont les bienvenues. Soumission Un seul document pdf contenant : * Titre * Auteur.e.s * Affiliation.s * Résumé (250 mots) * Courte biographie Appel à posters Public visé Personnes utilisatrices (journalisme, communication scientifique, milieux communautaires, etc.), personnes étudiantes de cycles supérieurs, postdoctorantes. Format Session posters de 45 minutes à 1 heure. Présentation sur place. Si vous souhaitez présenter un poster à distance, merci de nous contacter. Poster au format A0. Thématiques Les posters peuvent porter sur tous les enjeux en lien avec l’IA, la recherche d’information, la traduction, la publication scientifique, la découvrabilité ou les langues de diffusion de la science. Soumission Un seul document pdf contenant: * Titre * Auteur.e.s * Affiliation.s * Résumé (250 mots) * 3 à 5 mots-clés Pour les personnes étudiantes : cycle d’étude et stade de la recherche (exploratoire, résultats préliminaires, finalisée) Modalités générales * Langue des soumissions et des communications : français et anglais * Envoi des propositions : chaire.dcsf(a)proton.me * Date limite : 4 avril 2026 Les propositions seront évaluées au fur et à mesure de leurs réceptions. Aucune contribution ne sera évaluée après le 4 avril. Les communications feront l’objet d’une captation vidéo et d’une valorisation sur le site de la Chaire.

1 2

CFP: 5th Cardiff NLP Workshop (22-23 June 2026, Cardiff, Wales, UK)
by Nedjma Ousidhoum 01 Apr '26

01 Apr '26

Dear all, We are organising the 5th Cardiff NLP Summer Workshop, which will take place on 22–23 June 2026 in the Abacws Building in Cardiff (Wales, UK). The workshop is especially aimed for PhD students and early-career researchers (and anyone interested in NLP). Registration is free for all participants. Please fill in the expression of interest form<https://forms.gle/ypUBEpVhfoUhSgY16> by 11 April if you are interested in joining the workshop. Workshop activities include: * Invited speakers from academia and industry. * Tutorials. * Poster session and networking. * Panel discussion. Important dates: * Application period: 28 January – 11 April 2026. * Notification of acceptance: Late April 2026. * Workshop: 22–23 June 2026, Cardiff. For more details, please visit the workshop website: https://www.cardiffnlpworkshop.org/. Best regards, The Cardiff NLP Organising Team.

1 1

2026

2025

2024

2023

2022

Corpora March 2026