- Corpora - ELRA lists

2025 ACL Conference Registration
by Horacio Saggion 22 May '25

22 May '25

Dear Colleagues, The ACL 2025 Conference is pleased to announce that *registration is now officially open*. We encourage you to register early to take advantage of reduced rates. Please note the following important deadlines for registration: - *Early Registration:* Concludes on *Wednesday, July 2, 2025, AOE*. - *Late Registration:* Will close for both In-Person and Virtual attendees on *Friday, July 25, 2025, at 11:59 PM CET*. - *Onsite Registration:* Will be available for both In-Person and Virtual attendees from *Saturday, July 26, 2025, through August 1, 2025, at 11:59 PM CET*. Detailed information regarding the registration process can be found on the official conference website: https://acl.swoogo.com/acl2025 We look forward to welcoming you to ACL 2025 in beautiful Vienna! Sincerely, The ACL Organization Team -- Horacio Saggion Full Professor / Chair in Computer Science and Artificial Intelligence Head of the Natural Language Processing Group - TALN Project Coordinator iDEM Project (HE) Co-PI of the AI-BOOST project (HE) Co-PI of the IDEAL project (HE) Universitat Pompeu Fabra https://twitter.com/h_saggion https://www.linkedin.com/in/horacio-saggion-1749b916 -- Horacio Saggion Full Professor / Chair in Computer Science and Artificial Intelligence Head of the Natural Language Processing Group - TALN Project Coordinator iDEM Project (HE) Co-PI of the AI-BOOST project (HE) Co-PI of the IDEAL project (HE) Universitat Pompeu Fabra https://twitter.com/h_saggion https://www.linkedin.com/in/horacio-saggion-1749b916

1 0

Open Positions: NLP Postdocs at MBZUAI
by Teresa Lynn 22 May '25

22 May '25

We have three postdoc position openings at Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi. The project is based on a collaboration with a leading industry partner on the development of a conversational booking agent. * Postdoctoral Research Scientist in Conversational AI & NLP * Postdoctoral Research Scientist in Recommendation & Personalization * Postdoctoral Research Scientist in Persuasive Language Generation More information regarding responsibilities and requirements can be found on our webpage: https://mbzuai-hiring.github.io/ Start date: To be filled immediately, July 2025 - Duration: 1‑year contract with possibility of extension - Location: MBZUAI<https://mbzuai.ac.ae/>, Abu Dhabi, UAE - Apply via e‑mail: NLP.IndustryProject(a)mbzuai.ac.ae We look forward to receiving your application! Regards, Teresa Teresa Lynn , PhD Head of NLP Research Engagement Natural Language Processing P +971 2 811 3284 W.www.mbzuai.ac.ae<https://www.mbzuai.ac.ae/> [mbzuai logo.png] [cid:image002.png@01DBCB09.577C26E0] <https://www.instagram.com/mbzuai> [cid:image003.png@01DBCB09.577C26E0] <https://www.facebook.com/MBZUAI> [cid:image004.png@01DBCB09.577C26E0] <https://www.youtube.com/c/mbzuai> [cid:image005.png@01DBCB09.577C26E0] <https://www.linkedin.com/school/mbzuai/> [cid:image006.jpg@01DBCB09.577C26E0] <https://twitter.com/mbzuai>

1 0

Final call for participation: Automatic Detection of Borrowings shared task (ADoBo 2025)
by ELENA ALVAREZ MELLADO 21 May '25

21 May '25

This is the last call to participate in ADoBo 2025, the shared task on automatic detection of borrowings in Spanish. To gain access to the data, make submissions and check the leaderboard please join the competition at Codabench. Systems submissions will be due on May 26th. https://www.codabench.org/competitions/7284/ TIMELINE April 21: Dev set released. May 12: Test set released May 26: Systems output submissions. June 9: Working notes paper submission. June 16: Notification of acceptance (peer-reviews). June 23: Camera ready paper submission. September: ADoBo results to be presented at IberLEF 2025. ORGANIZATION COMMITTEE Elena Álvarez Mellado, Universidad Nacional de Educación a Distancia (UNED). Julio Gonzalo, Universidad Nacional de Educación a Distancia (UNED). Constantine Lignos, Brandeis University. Jordi Porta Zamorano, Universidad Autónoma de Madrid (UAM). AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente. Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datos<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://sede.uned.es/> de la Universidad. Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.

1 0

First CFP: The Tokenization Workshop@ICML 2025
by Jindrich Libovicky 21 May '25

21 May '25

Tokshop: Tokenization Workshop (ICML 2025) Submission to the Tokenization Workshop begins on April 14, 2025, via OpenReview. The deadline for submissions is May 30, 2025, at 11:59pm (anywhere on earth). Notifications of acceptance will be sent out on June 9, 2025, and camera-ready papers will be due shortly afterward at 11:59pm (anywhere on earth). The workshop will take place on July 18, 2025. Workshop Description The Tokenization Workshop (TokShop) at ICML aims to bring together researchers and practitioners from all corners of machine learning to explore tokenization in its broadest sense. We will discuss innovations, challenges, and future directions for tokenization across diverse data types and modalities. Call for Papers Topics of interest include: - Subword Tokenization in NLP: Analysis of techniques such as BPE, WordPiece, and UnigramLM, as well as improvements for efficiency, interpretability, and adaptability. - Multimodal Tokenization: Tokenization strategies for images, audio, video, and other modalities, including methods to align representations across different types of data. - Multilingual Tokenization: Development of tokenizers that work robustly across languages and scripts, and investigation into failure modes tied to tokenization. - Tokenizer Modification Post-Training: Methods for updating tokenizers after model training to boost performance and/or efficiency without retraining from scratch. - Alternative Input Representations: Exploration of non-traditional tokenization approaches, such as byte-level, pixel-level, or patch-based representations. - Statistical Perspectives on Tokenization: Empirical analysis of token distributions, compression properties, and correlations with model behavior. By broadening the scope of tokenization research beyond language, this workshop seeks to foster cross-disciplinary dialogue and inspire new advances at the intersection of representation learning, data efficiency, and model design. Submission guidelines Our author guidelines follow the ICML requirements unless otherwise specified. - Paper submission is hosted on OpenReview. - Each submission should contain up to 9 pages, not including references or appendix (shorter submissions also welcome). - Please use the provided LaTeX template (Style Files) for your submission. Please follow the paper formatting guidelines general to ICML as specified in the style files. Authors may not modify the style files or use templates designed for other conferences. - The paper should be anonymized and uploaded to OpenReview as a single PDF. - You may use as many pages of references and appendix as you wish, but reviewers are not required to read the appendix. - Posting papers on preprint servers like ArXiv is permitted. - We encourage each submission to discuss the limitations as well as ethical and societal implications of their work, wherever applicable (but neither are required). These sections do not count towards the page limit. - This workshop offers both archival and non-archival options for submissions. Archival papers will be indexed with proceedings, while non-archival submissions will not. - The review process will be double-blind Read more: https://tokenization-workshop.github.io/

1 0

eLex 2025 registration and Hornby bursaries
by Iztok Kosem 21 May '25

21 May '25

(apologies for multiple postings) Dear colleagues, We would like to inform you that the registration for the eLex 2025 conference has now opened (https://elex.link/elex2025/registration/). The deadline for early-bird fee is 5 September 2025. A call for Hornby bursary applications is also out (https://elex.link/elex2025/hornby-bursary/). The bursaries cover participants' registration fee, so if you intend to apply, please wait for results before paying the registration fees (you can still complete all the steps of the registration process and pay later). Finally, the special rates for rooms at the venue and partner hotels are available: https://elex.link/elex2025/venue/. There are a limited number of rooms available so early booking is advisable (there is a very friendly cancellation option). Please monitor the conference website for further updates on the programme, proceedings and related news. Looking forward to seeing you at the conference. Best regards Iztok Kosem Head of the eLex 2025 organising committee

1 0

HealTAC 2025 - call for participation
by Bea Alex 21 May '25

21 May '25

---------------------------- HealTAC 2025 June 16-18th, 2025, Glasgow (UK) https://healtac2025.github.io/ ---------------------------- Call for participation ---------------------------- The 8th Healthcare Text Analytics Conference (HealTAC 2025) invites everyone for three days of state of the art discussions on healthcare text analytics. The programme features -- keynotes on "Addressing the Missing Context Problem in Foundation Models for Healthcare" (by Jason Fries, Stanford University) and "AI for Healthcare: Text as a Medium for Multimodal datasets" (by Alison O'Neil, Canon Medical Research Europe) -- panels on "Opportunities and challenges in LLMs for health research: A multidisciplinary perspective on surfacing social inequalities, bias detection, and mitigation" and "Challenges in AI deployment within NHS"; -- 18 talks describing current PhD projects; -- a workshop on "NLP in mental healthcare and research" (June 16th); -- 4 demos and 9 lightning talks; -- 24 posters presenting healthcare text analytics research. The detailed programme is available at: https://healtac2025.github.io/programme/ ---------------------------- Registration fees ---------------------------- Due to generous support from Health Data Research UK, CogStack, Frontiers, DataMind, University of Glasgow, Research Data Scotland and Healtex, the registration fee is only £100 (for students) and £200 (for everyone else), and includes the full 3-day programme, lunches, the conference dinner and even breakfast on day 1. This is the early registration fee until May 29th. Registration details:https://healtac2025.github.io/registration/ ---------------------------- Accommodation and travel ---------------------------- The University accommodation is available for the registered participants for only £43 per night. All details are available at: https://healtac2025.github.io/accommodation/ Follow the conference announcements on social media at #HEALTAC2025 . We are looking forward to welcoming you to HealTAC 2025. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

1 0

[3rd CFP] The LLMs4Subjects Shared Task -- LLMs meet Extreme Multi-label Classification of Records in Digital Libraries
by D'Souza, Jennifer 21 May '25

21 May '25

The 2nd LLMs4Subjects Shared Task: LLM-based Subject Tagging for the TIB Technical Library's Digital Catalog Theme: The Development of Energy- and Compute-Efficient LLM Systems Organized as part of the German Evaluation (GermEval 2025) Shared Task Series 10. - 12. September, 2025 Hildesheim, Germany (co-located with KONVENS 2025 - Conference on Natural Language Processing) 2nd LLMs4Subjects Shared Task: https://sites.google.com/view/llms4subjects-germeval/ Join the Codabench Competition: https://www.codabench.org/competitions/8373/ KONVENS 2025: https://konvens-2025.hs-hannover.de/about/ Task Overview LLMs4Subjects challenges the research community to develop cutting-edge LLM-based solutions for subject tagging of technical records from Leibniz University's Technical Library (TIBKAT). Participants are tasked with leveraging large language models (LLMs) to tag technical records using the GND taxonomy. The task involves bilingual language modeling, as systems must process technical documents in both German and English. Successful solutions may be integrated into the operational workflows of TIB, the Leibniz Information Centre for Science and Technology. With the rapid advancements in LLMs, the focus is shifting toward making these models more energy- and compute-efficient while maintaining high performance. Recent innovations, such as the DeepSeek series, have demonstrated how techniques like mixture-of-experts (MoE) and model distillation can significantly reduce computational costs without sacrificing effectiveness. The 2nd LLMs4Subjects shared task highlights the importance of efficiency in LLMs, encouraging participants to explore strategies that enhance model performance while optimizing for energy consumption and inference speed. We welcome approaches (but not limited to) that leverage model compression, quantization, efficient fine-tuning, and adaptive computation techniques to push the boundaries of sustainable AI development. Subtasks The 2nd LLMs4Subjects shared task organizes the following two subtasks: Subtask 1 - Multi-Domain Classification of Library Records Subtask 2 - Large-scale Multilabel Subject Indexing of Library Records Important Dates * Release of training data: March 8, 2025 * Release of testing data: May 30, 2025 * Deadline for system submissions: June 2, 2025 * Evaluation end: June 27, 2025 * Paper submission deadline: July 7, 2025 * Notification of acceptance: June 28, 2025 * Camera-ready paper due: August 15, 2025 * Workshop/KONVENS: September 10 - 12, 2025 (TBA) Note: Submit your system outputs on our Codabench live leaderboard at https://www.codabench.org/competitions/8373/

1 0

2nd CfP: The 5th Workshop on Computational Linguistics for the Political and Social Sciences (CPSS)
by Ines Rehbein 21 May '25

21 May '25

2nd CfP: The 5th Workshop on Computational Linguistics for the Political and Social Sciences (CPSS-2025) https://cpss-sig.github.io/CPSS-2025 CPSS-2025 will be held in September 2025, co-located with KONVENS <https://konvens-2025.hs-hannover.de> in Hildesheim, Germany. The workshop will provide a forum for the presentation and discussion of innovative research on all aspects of using CL/NLP techniques for the political and social sciences, including: * Modeling political communication with NLP (e.g. topic classification, position measurement) * Mining policy debates from heterogeneous textual sources * Modeling complex social constructs (e.g. populism, polarization, identity) with NLP methods * Political and social bias in language models * Methodological insights in interdisciplinary collaboration: workflows, challenges, best practices * NLP support to understand and support democratic decision making * Resources and tools for Political/Social Science research * and many more... CPSS-2025 will be held in person. Special Theme The special theme of CPSS-2025 is *Validation and best practices for using NLP in political and social science research*. In addition to CPSS's general topics, we specifically invite submissions on this year's special theme, focussing on validation and best practices for applying NLP techniques for research in the political and social sciences. We are especially interested in papers addressing issues related to: * Data quality in human and synthetic data * Data leakage and contamination, especially in LLMs * New ways to collect data such as dataset donation * Validation of results beyond the train-dev-test paradigm of NLP and data science. * Any other topics related to the special theme. *Important Dates* All submission deadlines are 11:59 p.m. UTC-12:00 “anywhere on Earth.” Workshop papers due June 13, 2025 Notification of acceptance Aug 1, 2025 Camera-ready papers due Aug 10, 2025 Workshop date Sep 2025 *Submissions* We solicit two types of submissions: *archival papers* describing original and unpublished work (long papers: max. 8 pages, references/appendix excluded; short papers: max 4 pages, references/appendix excluded). Accepted papers will be published on the ACL anthology. For the submission format, refer to the KONVENS guidelines. *non-archival papers* (1-page abstracts, references excluded) describing ongoing work, PhD projects, or already published research. For more details, please refer to the CPSS-2025 website: https://cpss-sig.github.io/CPSS-2025 *CPSS 2025 organising committee* Dennis Assenmacher (GESIS), Christopher Klamm (U-Mannheim), Gabriella Lapesa (GESIS/U-Düsseldorf), Simone Ponzetto (U-Mannheim), Ines Rehbein (U-Mannheim), Indira Sen (U-Mannheim) -- Ines Rehbein Data and Web Science Group University of Mannheim, Germany

1 0

CIRCE Online Seminar | 26/05/2025 | Dr. Giuliana Regnoli | "Unveiling linguistic bias: Approaches to accent perception and discrimination"
by Claudia Soria 21 May '25

21 May '25

*🎓 *We are happy to announce the next webinar in the CIRCE online seminar series organized by the CIRCE <https://www.circe-project.eu/> project in collaboration with DFCLAM University of Siena <https://www.dfclam.unisi.it/en>, H2IOSC <https://www.h2iosc.cnr.it/> project and CNR-ILC <https://www.ilc.cnr.it/en/>. *Dr. Giuliana Regnoli* /University of Salerno, Italy & University of Regensburg, Germany/ */Unveiling linguistic bias: Approaches to accent perception and discrimination/* 📅 *May 26, 2025* 🕓 *4:40 PM – 5:30 PM (CEST)* *Venue*: Online *Attendees*: Secondary school teachers, researchers, language instructors *Summary: *Accent discrimination remains one of the most pervasive forms of linguistic bias, influencing social perceptions, identity construction, and attitudes towards language variation. This talk examines how accents shape linguistic hierarchies and social interactions, drawing on three research projects that employ distinct methodologies. First, we will explore how folk linguistic methods, such as map-drawing tasks, reveal nuanced spatial dimensions of language attitudes, challenging homogenising conceptualisations of World Englishes. This will be illustrated through a study on how a first-generation Indian diasporic community in Germany perceives and evaluates accent variation in Indian English. We will then turn to traditional language attitude research methods, focusing on questionnaire data to investigate overt stigmatisations and highlighting the importance of scale validation in direct attitude measurement. This discussion will be grounded in a pilot study on Italian university students’ direct attitudes towards English in Italy and their perceptions of Italian English. Finally, we will examine language attitudes in primary education in Cameroon, emphasising the importance of understanding children’s language perceptions within broader ideological frameworks. This analysis draws on data from parental and children’s questionnaires, as well as semi-structured interviews with children. By shedding light on early linguistic gatekeeping and its role in decolonising language education, this study also explores when and how these beliefs become embedded in society. Taken together, these projects demonstrate how different methodological approaches can be employed to investigate attitudes towards accents and linguistic variation, ultimately providing insights into how we can better understand and tackle accent discrimination. *Bio*: Dr. Giuliana Regnoli is assistant professor of English linguistics at the University of Salerno and a postdoctoral research fellow at the University of Regensburg. Her research interests include variationist sociolinguistics, sociophonetics, language attitudes, perceptual dialectology, and World Englishes. She is currently working on children's English in Cameroon and Italian university students' attitudes toward English(es) world-wide. Upcoming webinars: - Clara Molina (Monday, June 30, 2025) - Sender Dovchin (Monday, July 7, 2025) - Christian Ilbury (Monday, September 22, 2025) The seminar is free of charge, but participants must register. To access this and next events, you should create an account on theH2IOSC Training Environment <https://h2iosc-training-platform.ilc4clarin.ilc.cnr.it/registration>. Once logged in with your credentials, choose the course “Language and Accent Discrimination - Online Seminar Series” and activate it with the code PbK837GtE. Make sure to have the Teams platform installed. The registrations of the previous CIRCE Seminars are also available on the H2IOSC Training Environment. For any inquiry, write to contact(a)circe-project.eu.

1 0

*SEM2025 - Final Call for Papers
by Kemal Kurniawan 21 May '25

21 May '25

(Apologies for cross-posting) *SEM2025: The 14th Joint Conference on Lexical and Computational Semantics, Suzhou, China. (Co-located with EMNLP) https://starsem2025.github.io/ Third and Final Call for Papers *SEM brings together researchers interested in the semantics of natural languages and its computational modelling. The conference embraces a wide range of approaches including data-driven, neural, probabilistic and symbolic; practical applications as well as theoretical contributions are welcome. The long-term goal of *SEM is to provide a forum for NLP researchers working on any aspect of natural language semantics. *SEM invites submissions related to the computational modelling of natural language semantics (understood broadly) and its application. Relevant areas include (but are not limited to) theoretical aspects of computational semantics, empirical and data-driven approaches, resources, evaluation and applications/tools. *SEM encourages authors to consider ethical aspects of their work, and to address and discuss ethical questions and implications relevant to their research. *SEM also values reproducibility and particularly welcomes submissions that adhere to the reproducibility guidelines as specified here<https://folk.idi.ntnu.no/odderik/reproducibility_guidelines.pdf>. Submission Instructions Submissions must describe unpublished work and be written in English. We solicit both long and short papers. Long papers describe original research and may consist of up to eight (8) pages of content, plus unlimited pages for references. Appendices are allowed after the references, but the paper should be self-contained and reviewers will not be required to check the appendices, if any. Final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account. Short papers describe original focused research and may consist of up to four (4) pages, plus unlimited pages for references. Upon acceptance, short papers will be given five (5) content pages in the proceedings. Authors are encouraged to use this additional page to address reviewers comments in their final versions. Limitations and Ethics Statement sections are allowed and encouraged, but are not mandatory. These sections should be placed after the conclusion and will not count towards the overall page limit. Submissions should follow the ARR formatting requirements<https://github.com/acl-org/acl-style-files>. Submission routes and deadlines *SEM solicits both direct submissions and ACL Rolling Review (ARR) commitments. The deadline for direct submissions is May 30, 2025, and these submissions will be reviewed by the *SEM2025 program committee. ACL Rolling Review (ARR) submissions can be committed to *SEM up to August 22, 2025 (authors of ARR-reviewed papers need to include their OpenReview link with reviews in the submission form). Both types of submissions are made through OpenReview. Direct submission link: https://openreview.net/group?id=aclweb.org/StarSEM/2025/Conference<https://openreview.net/group?id=aclweb.org/StarSEM/2025/Conference> Multiple submission policy: *SEM does not prohibit the submission of work that is under consideration for another venue at the same time as the *SEM review period. However, authors of such papers will be asked to declare this at submission time. Important Dates (All deadlines are 11:59pm UTC-12h, AoE) Direct submission deadline (long & short papers): May 30, 2025 ARR-reviewed submission deadline (long & short papers): August 22, 2025 Notification of acceptance: September 5, 2025 Camera-ready deadline: September 26, 2025 Conference date: TBA (co-located with EMNLP 2025) Following the ACL and ARR policies<https://www.aclweb.org/portal/content/report-acl-committee-anonymity-policy>, there is no anonymity period requirement. Kemal Kurniawan | Research Fellow | (he/him) PhD School of Computing and Information Systems | Faculty of Engineering and IT Level 4, Melbourne Connect, 700 Swanston St The University of Melbourne, Victoria 3010 Australia E: kurniawan.k(a)unimelb.edu.au<mailto:kurniawan.k@unimelb.edu.au>

1 0

2025

2024

2023

2022

Corpora