- Corpora - ELRA lists

The Paradigm Shift: From Rules to Models in Natural Language Processing
by Amal Haddad 29 Apr '26

29 Apr '26

The Paradigm Shift: From Rules to Models in Natural Language Processing International Summer School Alicante, Spain, 15, 16 and 17 June 2026 https://summer-school.gplsi.es Third Call for Participation Natural Language Processing (NLP) has witnessed a clear paradigm shift: the transition from rule-based approaches to data-driven language models. While rule-based approaches dominated NLP for many years, during the 1990s and early 2000s they gradually gave way to statistical and machine-learning methods. It would be fair to say that data-driven models--and, most prominently, Deep Learning (DL), including more recently Large Language Models (LLMs)--have taken the world by storm. Deep Learning models are now used almost everywhere, across nearly every discipline, and Natural Language Processing is no exception. DL has proved highly promising so far, delivering improvements for almost every NLP task and application. However, as observed on numerous occasions, the outputs of DL models are not always ideal, with some studies reporting cases in which machine-learning approaches do not necessarily outperform the 'old-fashioned' rule-based ones. The overarching theme of the summer school will be this paradigm shift, with lectures and practical sessions reflecting the latest trends at both theoretical and practical levels. More specifically, the programme will combine lectures focusing on theoretical foundations with hands-on practical sessions. See the confirmed lectures below. The summer school will be ideal for both newcomers and experienced professionals in NLP, computer science, data science, cybersecurity, corpus linguistics, language technologies, and related disciplines, offering a unique opportunity to deepen expertise and engage with the rapidly evolving world of LLMs. Keynote speech: Roberto Navigli, 'Is Lexical Semantics Dead in the LLM Era?' We are delighted to announce Roberto Navigli (Sapienza University of Rome) as keynote speaker of the summer school who will deliver a keynote speech 'Is Lexical Semantics Dead in the LLM Era?' Summer school programme The summer school programme will feature the following lectures: Invited lecture 'Quantum Natural Language Processing: Foundations, Challenges, and Insights' Ellena Lloret (University of Alicante) 'Explainable AI in Natural Language Processing' Salima Lamsiyah (University of Luxembourg) 'Quality Estimation for Machine Translation' Tharindu Ranasinghe (Lancaster University) 'Understanding Language Models' Hansi Hettiarachchi (Lancaster University) 'LLMs for low-resource languages' Robiert Sepúlveda Torres and Iván Martínez (University of Alicante) 'Fairness in Machine Learning: Evaluating Gender Bias in LLMs' Juan Pablo Consuegra-Ayala (University of Alicante) 'Gaze data for NLP research: recording methods and analysis' Cengiz Acarturk (Jagiellonian University) 'Beyond the Single Text: NLP Reading in Digital Humanities' Isuri Anuradha (Lancaster University) 'Automatic hyperparameter optimisation and model selection for NLP pipelines' Ernesto Luis Estevanell (University of Alicante) 'Legal NLP in the LLM era' Damith Premasiri (Lancaster University) 'Machine Translation for Low-Resource Languages' Alicia Picazo-Izquierdo (University of Alicante) 'Sentiment analysis: from rule-based methods to Large Language Models' Maram Alharbi (Lancaster University) Panel discussion A panel discussion 'The future of NLP methods and language models' is scheduled as part of the summer school (https://summer-school.gplsi.es/panel/). The panel will be hosted/moderated by Ruslan Mitkov (Lancaster University and University of Alicante) and will include contributions from Roberto Navigli (Sapienza University of Rome) Elena Lloret (University of Alicante) Tharindu Ranasinghe (Lancaster University) Salima Lamsiyah (University of Luxembourg) Nasredine Semar (CEA) Yoan Gutiérrez Vázquez (University of Alicante) Gražina Korvel (Vilnius University) Venue, dates and accommodation The summer school will take place at the Research Institute of Informatics of the University of Alicante and will take place on 15, 16 and 17 June 2026. See the summer school website for recommended accommodation options (prospective participants are advised to book accommodation at their earliest convenience, as availability is limited) or more details in general. Summer School Directors Tharindu Ranasinghe (University of Lancaster) Salima Lamsiyah (University of Luxembourg) Summer School Chair Ruslan Mitkov (University of Alicante) Advisory Committee Manuel Palomar Sanz (University of Alicante) Rafael Muñoz Guillena (University of Alicante) Andrés Montoyo Guijarro (University of Alicante) Organising Committee Raúl García Cerdá (University of Alicante) Alicia Picazo Izquierdo (University of Alicante) Ernesto Luis Estevanell (University of Alicante) Maram Alharbi (Lancaster University) Registration Registration can be completed at https://summer-school.gplsi.es/registration/. Kindly note that early-bird registration closes on 25 May 2026. Related events The summer school will follow the second international conference _Natural Language Processing and Artificial Intelligence_ (NLPAICS'2026) which will take place in Alicante on 11 and 12 June 2026 (https://nlpaics2026.gplsi.es). Those who register for both events will benefit from a discounted registration fee. Further information The summer school website is updated on regular basis. Alternatively, interested parties can email summer-school(a)dlsi.ua.es for more information. -- Amal Haddad Haddad (She/her) Facultad de Traducción e Interpretación Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad Lexicon Research Group |http://lexicon.ugr.es/haddad Co-Convenor, BAAL SIG 'Humans, Machines, Language'|https://r.jyu.fi/humala Event Coordinator, BAAL SIG 'Language, Learning and Teaching' =============== Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it" ===============

1 0

WoLaLa 2026: Second call for papers
by Héja Enikő 29 Apr '26

29 Apr '26

2nd Call for Papers 2nd International Workshop on Language and Language Models (WoLaLa) Dubrovnik, Croatia | October 12-13 The ELTE Research Centre for Linguistics, the University of Zagreb, Faculty of Humanities and Social Sciences, and the Croatian Language Technologies Society invite submissions to the 2nd International Workshop on Language and Language Models. This workshop is designed as a dedicated forum for scholars and practitioners in the social sciences and humanities (SSH) to discuss and evaluate large language models from an SSH perspective, and to share best practices that can advance research and applications within these fields. Relevant topics include, but are not limited to, the following areas: General language models: Critical and comparative analyses of state-of-the-art language models, including their linguistic competence, performance, and limitations. Cultural and linguistic perspectives: Investigations into the cultural, cognitive, and scientific aspects of language processing, including the unexplored territories of model behavior and linguistic capability. Applications and best practices: Case studies and best practices in applying AI to language research, highlighting the potential for cross-disciplinary innovation within SSH. Bridging disciplines: Contributions that examine the role of language models in reshaping traditional SSH methodologies, and proposals on integrating AI insights into linguistic inquiry. IMPORTANT DATES 20 May 2026: Submission deadline 08 August 2026: Notification of acceptance 12 October – 13 October 2026: Workshop in Dubrovnik 15 December 2026: Full paper submission deadline Submissions We expect submissions in the form of extended abstracts (length: 3 to 4 pages including references) in PDF format, in accordance with the template (https://www.overleaf.com/read/sbmczvkpxpzz#4a94e3). Please ensure your submission clearly outlines your research question, methodology, and preliminary findings. Extended abstracts must be submitted through the EasyChair submission system <https://easychair.org/conferences/?conf=wolala2026> and will be reviewed by the Programme Committee. All proposals will be reviewed on the basis of the following criteria: Appropriateness: The contribution must pertain to the topics listed above Soundness and correctness: The content must be technically and factually correct; methods must be scientifically sound, according to best practice, and preferably evaluated. Meaningful comparison: The abstract must indicate that the author is aware of alternative approaches, if any, and highlight relevant differences. Substance: Concrete work and experiences will be given preference over ideas and plans. Impact: Contributions with a higher impact on the research community and society more broadly will be given preference over papers with lower impact. Clarity: The abstract should be clearly written and well structured. Timeliness and novelty: The work must convey relevant new knowledge to the audience at this event. Programme Committee The Programme Committee for the conference consists of the following members: Marko Tadić, University of Zagreb, Croatia (chair) António Branco, University of Lisbon, Portugal Eva Hajičová, Charles University Prague, Czech Republic Erhard Hinrichs, University of Tubingen, Germany András Kornai, HUN-REN Institute for Computer Science and Control, Hungary Alessandro Lenci, University of Pisa Csaba Pléh, Central European University, Austria Gábor Prószéky, ELTE Research Centre for Linguistics & Pázmány Péter Catholic University Paul Rayson, Lancaster University, United Kingdom Frédérique Segond, National Institute for Research in Digital Science and Technology, France Dan Tufiș, Romanian Academy, Romania Hans Uszkoreit, German Research Center for Artificial Intelligence, Germany Tamás Váradi, HUN-REN Hungarian Research Centre for Linguistics, Hungary Martin Wynne, University of Oxford, United Kingdom LINKS 2nd International Workshop on Language and Language Models website: https://wolala.nytud.hu <https://wolala.nytud.hu/> EasyChair submission: https://easychair.org/conferences/?conf=wolala2026 Template for submissions: ZIP-archive: https://wolala.nytud.hu/templates/WoLaLa2026.zip <https://wolala.nytud.hu/templates/WoLaLa2025.zip> Overleaf template: <https://www.overleaf.com/read/xsvjrhvjyfmj#f3362f>https://www.overleaf.com/read/prvhqbxdgmxq#374f7b Contact for any questions regarding the conference: info(a)wolala.nytud.hu

1 0

Data in Historical Linguistics Seminar Series – Seminar 8
by Andrea Farina 29 Apr '26

29 Apr '26

The eighth talk of the Data in Historical Linguistics Seminar Series will take place remotely on Monday 11th May 2026 at 5pm BST. Federico Viglino (Guglielmo Marconi University, Italy) will be presenting on "Middle voice in the diachrony of Ancient Greek: a quantitative (and qualitative!) approach”. Registration for this talk will close at midnight on Friday 8th May and the link for this can be accessed here: https://forms.gle/ioQ7qbspf9ebc19J7 Participants will receive a Microsoft Teams link via email on the morning of the talk. The abstract for this talk can be found at this page<https://datainhistoricallinguistics.wordpress.com/2025/12/19/monday-11-may-…>. The programme and registration links for all talks in the series can be found on our website: https://datainhistoricallinguistics.wordpress.com/2026-programme/ This seminar series is run by Andrea Farina (King’s College London) and Dr Mathilde Bru and is aimed at PhD students and early career researchers. The purpose of this seminar series is to bring together researchers working on historical linguistics with a quantitative approach, and to discuss current avenues of research in this topic. We hope that these seminars will nurture international collaboration and establish academic ties among researchers working on similar topics in this field. Join our mailing list<https://datainhistoricallinguistics.wordpress.com/join-us/>: https://datainhistoricallinguistics.wordpress.com/join-us/

1 0

GRACE@IberLEF2026 (Deadline Extenssion): Clinical Argument Mining shared task in Spanish connecting Explainable AI and Evidence-Based Medicine
by aitziber.atucha＠ehu.eus 29 Apr '26

29 Apr '26

Registration open!! ######################################################## GRACE@IberLEF2026: https://www.codabench.org/competitions/13280/ ######################################################## ****We apologize for multiple postings of this e-mail**** GRACE@IberLEF2026 announces the first edition of a novel shared task on Argument Mining in Spanish connecting Explainable AI and Evidence-Based Medicine across clinical trials and medical licensing examinations. ⚗️ Argument Mining Argument Mining automatically extracts claims and evidence from clinical text and reveals how they support or challenge each other, enabling transparent, traceable clinical reasoning. 🌍 Spanish, First GRACE is the first Argument Mining shared task in Spanish for the clinical domain, filling a key gap in shared tasks for multilingual biomedical NLP with fine-grained, entity-level annotations. Track 01 🔬 Clinical Trial Evidence & Argumentation This track focuses on abstracts of Randomized Controlled Trials (RCTs). Their standardized design, contrasting an intervention with a control group, provides a transparent path from data to conclusions, making argumentative components more accessible to automated systems. Goal: Identify argumentative components (claims and premises) and detect support/attack relations at the sentence level. Track 02 🩺 Clinical Case Reasoning (MIR) This track uses cases from the MIR (Médico Interno Residente) exam, Spain's national medical specialization test. Each instance pairs a dense clinical narrative with five competing diagnostic or treatment options, only one of which is correct. Goal: Extract fine-grained evidence spans that justify the correct option while refuting the incorrect alternatives. 📅 Important Dates 📂 Release of Training & Dev Sets March 18 🚀 Official Test Set Release April 22 ⏰ Deadline for Result Submission May 15 📊 Publication of Results May 20 📄 System Paper Submission June 6 ✅ Notification of Acceptance June 17 🎤 IberLEF Workshop (at SEPLN) September 22

1 0

NTCIR-19 Tip-of-the-Tongue (ToT) Retrieval/QA Shared Task.
by diazf＠acm.org 28 Apr '26

28 Apr '26

We are excited to announce the Call for Participation for NTCIR-19 Tip-of-the-Tongue (ToT) Shared Task. ToT known-item retrieval is defined as “an item identification task in which the searcher has previously experienced an item but cannot recall a reliable identifier”—i.e., “It’s on the tip of my tongue…”. After 3 successful years as a TREC Track, the ToT shared task is expanding to NTCIR for 2026. The NTCIR-19 ToT Shared Task will focus on open-domain ToT information needs in multiple languages (English, Chinese, Japanese, and Korean). You can participate in the shared task in any subset of these languages, and you are also welcome to present your work remotely at the NTCIR conference in Tokyo in December 2026. Please visit the following websites for further information. Task guidelines: https://ntcir-tot.github.io/guidelines Registration: https://research.nii.ac.jp/ntcir/ntcir-19/howto.html (Deadline: June 1) Important dates March 27: Release corpus and training queries May: Release test queries June 1st: Deadline for registration July (tentative): Deadline for submitting runs Please consider participating and help us spread the word! Best regards, Fernando Diaz On behalf of the NTCIR-19 ToT Shared Task organizers

1 0

10th Workshop on Online Abuse and Harms (WOAH) @EMNLP: 2nd CFP
by Agostina Calabrese 28 Apr '26

28 Apr '26

10th Workshop on Online Abuse and Harms (WOAH) @EMNLP: 2nd CFP *** Second Call for Papers *** We invite paper submissions to the 10th Workshop on Online Abuse and Harms (WOAH), which will take place on 24-29 October at EMNLP 2026. Website: https://www.workshopononlineabuse.com/cfp.html Important Dates * Registration deadline for mentorship programme: April 10, 2026 * Notification of mentor/mentee match: April 25, 2026 * Submission due: June 26, 2026 * ARR reviewed submission due: August 3, 2026 * Notification of acceptance: August 15, 2026 * Camera-ready papers due: September 10, 2026 * Workshop: 24-29 October 2026 Overview Digital technologies have brought significant benefits to society, transforming how people connect, communicate, and interact. However, these same technologies have also enabled the widespread dissemination and amplification of abusive and harmful content, such as hate speech, harassment, and misinformation. Given the sheer volume of content shared online, addressing abuse and harm at scale requires the use of computational tools. Yet, detecting and moderating online abuse remains a complex task, fraught with technical, social, legal, and ethical challenges. The 10th Workshop on Online Abuse and Harms (WOAH) invites paper submissions from a diverse range of fields, including but not limited to natural language processing, machine learning, computational social science, law, political science, psychology, sociology, and cultural studies. We explicitly encourage interdisciplinary research, technical and non-technical contributions, and submissions that focus on under-resourced languages. Non-archival papers and civil society reports are also welcome. Topics covered by WOAH include, but are not limited to: * New models or methods for detecting abusive and harmful online content, including misinformation; * Biases and limitations in existing detection models or datasets for abusive and harmful content, especially those in commercial use; * Development of new datasets and taxonomies for online abuse and harms; * Novel evaluation metrics and procedures for detecting harmful content; * Analyses of the dynamics of online abuse, its propagation, and its impact on different communities; * Social, legal, and ethical considerations in detecting, monitoring, and moderating online abuse. Special Theme: “Ten Years of WOAH: Reflecting on Progress and New Frontiers” In its 10th edition, WOAH highlights the theme “Ten Years of WOAH: Reflecting on Progress and New Frontiers”. Over the past decade, WOAH has become a central interdisciplinary venue for online harms research. As harms and enabling technologies have evolved, the field has moved beyond an early focus on textual hate speech and harassment to address more complex phenomena. Advances in AI and online ecosystems have expanded the scale and diversity of harms. Transformer models, multimodal platforms, and recommendation systems have contributed to the escalation of issues like misinformation, radicalisation, child sexual exploitation, identity-based abuse, algorithmic bias, privacy violations, and AI-mediated harms. Methods tackling this have evolved from monolingual lexicon-based approaches to deep learning, multilinguality, multimodality, interpretability, and interdisciplinarity. Despite this progress, fundamental challenges remain. There is limited consensus on what constitutes “harm”, how context and thresholds should be defined, or how harms vary across cultures and modalities. These ambiguities affect datasets and models, constrain comparability, and often marginalise affected communities. The past decade also calls for critical self-reflection. Research has frequently prioritised detection, high-resource languages, and narrowly defined phenomena over intervention, global perspectives, and systemic or structural harms, with insufficient attention to user agency, platform incentives, lived experience, and participatory approaches. Finally, ten years of work have underscored that interdisciplinarity is essential for addressing the sociotechnical nature of the phenomenon. Addressing future online harms will require deeper integration across NLP, ML, social sciences, law, policy, and HCI. WOAH 10 seeks to consolidate lessons from the past decade, identify enduring gaps, and connect research, practice, and policy to guide the next generation of work on online harms. Submission Submission is electronic, using the Softconf START conference management system. Submission link: TBA The workshop will accept three types of papers. 1) Academic Papers (long and short): Long papers of up to 8 pages, excluding references, and short papers of up to 4 pages, excluding references. Unlimited pages for references and appendices. Accepted papers will be given an additional page of content to address reviewer comments. Previously published papers cannot be accepted. 2) Non-Archival Submissions: Up to 2 pages, excluding references, to summarise and showcase in-progress work and work published elsewhere. 3) Civil Society Reports: Non-archival submissions, with a minimum of 2 pages and no upper limit. Can include work published elsewhere. All submissions must use the official ACL style files<https://github.com/acl-org/acl-style-files>. Submissions that do not conform to the required styles, including paper size, margin width, and font size restrictions, will be rejected without review. All submissions should adhere to the workshop policies https://www.workshopononlineabuse.com/policies.html. WOAH Community We are excited to share the WOAH community Slack channel — a workspace for researchers interested in or working on understanding and addressing online abuse and harms! Join us here: https://join.slack.com/t/hatespeechdet-47d7560/shared_invite/zt-2a8d96j4z-g… Contact Info Please send any questions about the workshop to organizers(a)workshopononlineabuse.com<mailto:organizers@workshopononlineabuse.com> Organisers Agostina Calabrese, Cohere Thomas Davidson, Rutgers University-New Brunswick Christine de Kock, University of Melbourne Urja Khurana, Delft University of Technology Marta Marchiori Manerba, University of Turin Paloma Piot, Universidade da Coruña Zeerak Talat, University of Edinburgh The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

1 0

Second Call for Participation for HAHA at IberLEF 2026
by Luis Chiruzzo - Inco 27 Apr '26

27 Apr '26

*** Second Call for Participation for HAHA at IberLEF 2026 <https://sites.google.com/view/iberlef-2026> *** Humor Analysis based on Human Annotation and Automatic Humor Generation https://www.fing.edu.uy/inco/grupos/pln/haha/ Codabench page: https://www.codabench.org/competitions/14700/ NEWS: The trial and development data have been released. You can now submit your systems for the development phase! Can computers be funny? Can humans identify computer-generated humor? While humor has been studied historically from psychological, cognitive, and linguistic perspectives, its computational study is an active area of research in Machine Learning and Computational Linguistics that has gained traction in recent years. There has been significant development mainly in the field of automatic humor detection and classification, but a characterization of humor that enables its automatic recognition and generation is far from being solved. This task aims to gain better insight into what is humorous and what causes laughter, and to take some steps forward by assessing the capabilities of current LLMs to generate actual humorous content in Spanish and attempting to see whether it’s possible to automatically distinguish between computer-generated humor and humor written by humans. The target audience is NLP researchers interested in advancing the understanding of highly subjective and creative tasks, though anyone is welcome to participate. Task description This year, the HAHA evaluation campaign proposes three different subtasks related to automatic humor detection and generation, with the aim of deepening our understanding of computational humor. Subtask 1 - Humor Detection: determining if a news headline is satirical or real. The main performance metric for this subtask will be the F1 score of the 'humorous' class. This subtask is similar to the first subtask proposed in previous editions of the HAHA shared task, but this time it's applied to a particular domain where humorous and non-humorous content might sometimes be difficult to tell apart. Subtask 2 - LLM-generated humor detection: determining if a joke inspired by a news headline was generated by an LLM or written by a human. The main performance metric for this subtask will be the F1 score of the 'automatic' class. Subtask 3 - Humor Generation: generating jokes from a news headline using computational methods. This subtask will be evaluated through human preference judgments, employing LLM arena-style battles between pairs of generated jokes, and ranking the systems using an Elo-based leaderboard. How to Participate The CodaBench page for the competition is available: https://www.codabench.org/competitions/14700/ Important Dates March 18th, 2026: team registration page. April 8th, 2026: development sets released and open for dev submissions. May 27th, 2026: test sets released and open for test submissions. June 3rd, 2026: end of test submissions, publication of results of subtasks 1 and 2. June 10th, 2026: publication of results of subtask 3. June 12th, 2026: paper submission. June 23rd, 2026: notification of acceptance. July 1st, 2026: camera-ready paper submission. September 2026: IberLEF 2026 Workshop.

1 0

Call for Participation: PROFE @ IberLEF 2026 (Test data now available!)
by ALVARO RODRIGO YUSTE 27 Apr '26

27 Apr '26

<Apologies for cross-postings> ------------------------------------------------ Release of test data and registration still open !! PROFE 2026: Language Proficiency Evaluation IberLEF 2026 Shared Task Website URL: https://sites.google.com/view/profe2026 CodaLab site: https://www.codabench.org/competitions/15902/ PROFE 2026 reuses the exams for Spanish proficiency evaluation developed by Instituto Cervantes along many years to evaluate human students. Therefore, automatic systems will be evaluated under the same conditions as humans were. Systems will receive a set of exercises with their corresponding instructions without specific training material. In this way we expect Transfer Learning approaches or the use of Generative Large Language Models. The previous edition proposed exams based only on text. In this new edition, we will include exams with images, which sometimes require interpretation to answer the exercise correctly. We propose evaluating systems on their ability to perform multimodal reasoning, moving beyond text-only comprehension. We will provide a limited set of new image-based exercises while retaining the dataset from the previous edition. This setup encourages participants to develop strategies for handling the scarcity of specific training data. Subtasks PROFE 2026 has three subtasks, one per exercise type. Teams can participate in any combination of them. Each subtask contains several exercises of the same type. The subtasks are: 1. Multiple choice subtask: each exercise includes a text and a set of multiple-choice questions about the text where only one answer is correct. Given a multiple-choice question, systems must select the correct answer among the candidates. 2. Matching subtask: each exercise contains two sets of texts. Systems must find the text in the second set that best matches the first set. There is only one possible matching per text, but the first set can contain extra unnecessary texts. 3. Filling the gap subtask: each exercise contains a text with several gaps corresponding to textual fragments that have been removed and presented disorderly as options. Systems must determine the correct position for each fragment. There is only one correct text per gap, but there could be more candidates than gaps. The different exercises open research on how to approach them, adapting different prompts when using generative models. As the main novelty in this edition, some exercises will contain images. While some of these images will be the candidate answers (rather than text excerpts), others might provide visual information needed to answer the exercise correctly. Conversely, some images will not provide essential information. Consequently, systems participating in this edition must adopt a multimodal approach, capable of discerning when to integrate visual cues and when to disregard them. This necessity to filter visual relevance introduces significant new challenges compared to the previous edition. Dataset We will use the IC-UNED-RC-ES dataset created from real examinations at Instituto Cervantes. These exams were created by human experts to assess language proficiency in Spanish. We have already collected the exams and converted them to a digital format, which is ready to be used in the task. The dataset contains exams at different levels (from A1 to C2). The description of the full dataset was published in the following paper: * Anselmo Peñas, Álvaro Rodrigo, Javier Fruns-Jiménez, Inés Soria-Pastor, Sergio Moreno-Álvarez, Alberto Pérez García-Plaza, and Julio Reyes-Montesinos. A Spanish Language Proficiency Dataset for AI Evaluation<https://www.mdpi.com/2078-2489/17/2/159>. Information 17, no. 2: 159. DOI: 10.3390/info17020159<https://doi.org/10.3390/info17020159>. 2026. The complete dataset contains 282 exams with 855 exercises. The total number of evaluation points are 6146 (among 16570 options) distributed by exercise type as: multiple-choice: 3544 responses matching: 2309 responses fill-the-gap: 293 responses In PROFE 2026, we plan to use around 50% of the exams; the other 50% was already used for the PROFE 2025 edition. We intend not to distribute the gold standard to prevent overfitting in post-campaign experiments and data contamination in LLMs. Evaluation measures and baseline We will use traditional accuracy (proportion of correct answers) as the main evaluation measure. Systems will receive evaluation scores from two different perspectives: * At the question level, where correct answers are counted individually without grouping them. * At the exam level, where scores for each exam are considered. Each exam contains several exercises of different types. An exam is considered to be passed if an accuracy score (accounted as the proportion of correct answers) above 0.5 is reached. Then, the proportion of passed exams is given as a global score. This perspective will only apply to those teams participating in the three subtasks. More in detail, the exact evaluation per subtask is as follows: * Multiple choice subtask: we will measure accuracy as the proportion of questions correctly answered * Matching subtask: we will measure accuracy as the proportion of correct texts matched. * Fill in the gap subtask: We will measure accuracy as the proportion of correctly filled gaps. We will use accuracy as the evaluation measure because there is only one correct option among candidates and because it is the measure applied to humans doing the same exams. Thus, we can compare the performance of automatic systems and humans under the same conditions A preliminary baseline using ChatGPT obtains the following results for each exercise type (provided that different prompting can produce slightly different results): * Multiple choice accuracy: 0.64 * Filling the gap accuracy: 0.43 * Matching accuracy: 0.51 Schedule April April 10, 2026 Development data released April 27, 2026 Test set release May May 11, 2026 Deadline for submitting runs May 18, 2026 Release of evaluation results June June 3, 2026 Paper submission deadline Organizers Alvaro Rodrigo<https://www.uned.es/universidad/docentes/informatica/alvaro-rodrigo-yuste.h…>, UNED NLP & IR Group (Universidad Nacional de Educación a Distancia) Anselmo Peñas<https://www.uned.es/universidad/docentes/informatica/anselmo-penas-padilla.…>, UNED NLP & IR Group (Universidad Nacional de Educación a Distancia) Alberto Pérez<https://www.uned.es/universidad/docentes/informatica/alberto-perez-garcia-p…>, UNED NLP & IR Group (Universidad Nacional de Educación a Distancia) Sergio Moreno<https://www.uned.es/universidad/docentes/en/informatica/sergio-moreno-alvar…>, UNED NLP & IR Group (Universidad Nacional de Educación a Distancia) Javier Fruns, Instituto Cervantes Inés Soria, Instituto Cervantes Rodrigo Agerri<https://ragerri.github.io/>, HiTz (Universidad del País Vasco, UPV/EHU) AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente. Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datos<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://uned.sede.gob.es/> de la Universidad. Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.

1 0

KONVENS 2026: Final Call For Papers & Deadline Extension
by heike.zinsmeister＠uni-hamburg.de 27 Apr '26

27 Apr '26

(apologies for cross-posting; please redistribute) KONVENS 2026 FINAL Call for Conference Papers & Deadline Extension! https://konvens2026.uni-hamburg.de/ We are delighted to share the second call for papers with you for Konferenz zur Verarbeitung natürlicher Sprache (KONVENS) 2026, organized under the auspices of the GSCL, the DGfS-CL, the ÖGAI, and SwissNLP. This year’s KONVENS will take place in Hamburg, September 14 – 17 under the special theme “Context Matters: NLP Beyond Text”. The conference will include a diverse program including talks by our two keynote speakers: * Dr. Valentin Hoffmann, Allen Institute for AI * Prof. Dr. Barbara Plank, LMU Munich. We invite the submission of long and short papers featuring substantial, original, and unpublished research on Natural Language Processing and Computational Linguistics, to be archived in the ACL Anthology, as well as abstract submissions that describe research in progress or published elsewhere. Beyond standard research contributions, submissions are welcome that present negative results, survey an area, introduce new resources, articulate a position, report novel linguistic insights obtained using existing computational methods, or reproduce (successfully or not) previous findings. We welcome the following types of paper submissions: * Long papers (up to 8 pages plus references), describing original research with substantial new results. * Short papers and demos (up to 4 pages plus references), including small and focused contributions, work in progress, as well as descriptions of projects, systems and resources. * Abstracts (1 page, non-archival), which will be presented at the poster session and printed in the proceedings, but which will be non-archival. We especially invite submission on ongoing projects, student projects, past or ongoing bachelor and master theses, ongoing or recently completed PhD theses, and opinion pieces in this category to foster interaction and discussion in our community. Papers can be submitted either to the main conference track or to the special track “Context Matters”. Context Matters Track The widespread use of large language models (LLMs) and other types of language technology in research and real-world applications has fundamentally reshaped how natural language processing (NLP) systems interact with people and their environments. As NLP systems increasingly operate in socially embedded, high-impact settings like search, conversational agents and recommendation systems in business, education, medicine, law, and beyond, it becomes crucial to move beyond text in isolation and to account for the many forms of context that shape language use and interpretation. These include user-related factors (e.g., identity aspects like socio-demographic characteristics and the resulting perspectival differences), cultural and societal context, interaction history, application constraints, and signals from other modalities. The “Context Matters” track focuses on how different forms of context influence NLP systems, their design, their behavior, and their use. We invite work that studies NLP not as decontextualized text processing, but as situated technology embedded in human, social, disciplinary, and multimodal environments. Here, disciplines and application domains are important not only as areas of use, but as sources of structured contextual knowledge, perspectives, and methodological traditions — particularly from the social sciences and humanities, but also law, education, psychology, economics, and the natural sciences. In particular, the special theme includes: * Research that models user- and group-related context, such as identity aspects, socio-demographic variables, cultural background, or perspectival differences, and examines how these factors affect language use, system behavior, or system impact * Work that draws on or operationalizes concepts from other disciplines like the social sciences and related fields (e.g., social theory, cultural analysis, behavioral perspectives) to better understand linguistic phenomena, system outputs, or evaluation settings * Research analyzing social, societal, and institutional context, including norms, power structures, and real-world deployment environments, especially with respect to ethics, bias, and societal consequences * Studies of application context, where domain-specific constraints (e.g., in education, law, public administration, or the natural sciences) shape both language use and system requirements * Approaches that move beyond text-only processing and integrate multiple modalities (e.g., vision, audio, video, sensor data), with attention to the distinct contextual signals these modalities introduce * Work incorporating interactional context, such as dialogue history, user intent, and evolving human–AI interaction dynamics While the modelling component should include language, we especially encourage contributions that treat language as part of a broader contextual ecosystem, aiming toward more grounded, adaptive, and socially aware NLP systems. Papers must be in English and formatted in accordance with the ACL style sheet https://github.com/acl-org/acl-style-files and submitted via the submission link: https://openreview.net/group?id=GSCL.org/KONVENS/2026/Conference Please consider the OpenReview policy for new accounts: * New profiles created without an institutional email will go through a moderation process that can take up to two weeks. * New profiles created with an institutional email will be activated automatically. KONVENS also adopts the ACL policies for submission, review, and citation, the ACL privacy policy, and the ACL code of ethics. Further information can be found on the conference website: https://konvens2026.uni-hamburg.de/ Submissions need to be anonymized to ensure double-blind review. However, we allow for pre-prints to be posted any time before or during the review period. We strongly encourage authors to use LaTeX in preparing their document. Important dates: NEW 12.5.2026 Paper Submission Deadline 12.7.2026 Notification of Acceptance 01.8.2026 Camera-Ready Deadline 14.9. – 17.9.2026 KONVENS in Hamburg See you in Hamburg! Your conference chairs, Heike Zinsmeister, Chris Biemann, and Anne Lauscher

1 0

NLP4PI @ EMNLP2026: First Call for Papers
by Daryna Dementieva 27 Apr '26

27 Apr '26

Dear community! We are delighted to invite you for submission to the 5th Workshop on NLP for Positive Impact co-located at EMNLP 2026! Workshop website: https://sites.google.com/view/nlp4positiveimpact <https://sites.google.com/view/nlp4positiveimpact>Call for paper: https://sites.google.com/view/nlp4positiveimpact/call-for-papers-2026 Submission methods: OpenReview both direct submissions and ARR May Cycle commitment. We also accept non-archival submissions. Important dates: ARR May Cycle Submission Due: May 25th, 2026 Direct Submissions Due: June 26th, 2026 via https://openreview.net/group?id=EMNLP/2026/Workshop/NLP4PI<https://openreview.net/group?id=EMNLP/2026/Workshop/NLP4PI#tab-your-consoles> ARR Reviewed Submissions Commitment Due: July 26th, 2026 (tentative) Notification of Acceptance (both channels): August 15th, 2026 Camera-Ready Papers Due: September 10th, 2026 Workshop Date: October 24th-29th 2026 (co-located with EMNLP 2026) All deadlines are 11:59 PM (Anywhere on Earth) Workshop Summary The increasing adoption of language-oriented AI systems offers unprecedented opportunities for positive societal impact. NLP technologies have matured to the point where they can meaningfully contribute to addressing global challenges like poverty, hunger, healthcare, education, inequality, COVID-19, and climate change, aligning with the UN sustainability goals. This workshop aims to advance innovative NLP research that benefits society, emphasizing responsible methods and impactful applications. We welcome submissions in areas including, but not limited to: * Grounding NLP in Real-World Impact: Beyond improving model performance, how can NLP systems be directly tied to social outcomes? This could include case studies of real-world deployments or strategies for better deployment and maintenance practices. * Underexplored Applications: While NLP for healthcare and mental well-being is well-established, we encourage research tackling overlooked areas such as poverty, hunger, energy, and climate change. * Interdisciplinary Collaborations: We highly value work that integrates insights from other fields, such as social science, political science, economics, philanthropy, and HCI, and we encourage submissions of case studies or examples that highlight such collaborations. Special Theme: Measuring the Societal Impact of AI and NLP This year we would like to find an answer to the question: How can we measure the social impact of AI and NLP? With even the bigger raise of opportunities of AI and language technologies, we would like to understand how it influences society and if in positive manners. Position, philosophical-grounded, and new evaluation framework suggestion papers are very much welcomed to enhance the discussion! Submission Types We encourage diverse contributions, including: * Identifying social needs and affected demographics. * Proposing new tasks or directions through position papers. * Conducting literature reviews or philosophical discussions on NLP’s societal impact. * Designing user studies, surveys, or ethical frameworks. * Exploring interdisciplinary methods and collaboration strategies. Submissions must address the ethical and societal implications of the work, with a clear focus on defining and achieving positive impact. We look forward to fostering discussions that inspire actionable, responsible advancements in NLP for the greater good. Papers Format Both long and short paper submissions should follow all of the ARR submission requirements https://aclrollingreview.org/cfp#paper-submission-information, including: Long Papers <https://aclrollingreview.org/cfp#long-papers> (8 pages) and Short Papers (4 pages). Organizers Katherine Atwell (Northeastern University) Angana Borah (University of Michigan) Dr. Daryna Dementieva (Technical University of Munich) Prof Elisa Kreiss (University of California) Dr. Neema Kotonya (Dataminr) Jiarui Liu (Carnegie Mellon University) Liz Olson (Dataminr) Ruyuan Wan (Pennsylvania State University) Prof Jieyu Zhao (University of Southern California) Steering Committee Prof Rada Mihalcea (University of Michigan) Dr. Joel Tetreault (Dataminr) Dr. Zhijing Jin (University of Toronto) Contact Email: nlp4pi.workshop(a)gmail.com<mailto:nlp4pi.workshop@gmail.com> All positive regards, Daryna Dementieva On behalf of NLP4PI Workshop Organizers

1 0

2026

2025

2024

2023

2022

Corpora