May 2025 - Corpora - ELRA lists

2025 ACL Conference Registration
by Horacio Saggion 22 May '25

22 May '25

Dear Colleagues, The ACL 2025 Conference is pleased to announce that *registration is now officially open*. We encourage you to register early to take advantage of reduced rates. Please note the following important deadlines for registration: - *Early Registration:* Concludes on *Wednesday, July 2, 2025, AOE*. - *Late Registration:* Will close for both In-Person and Virtual attendees on *Friday, July 25, 2025, at 11:59 PM CET*. - *Onsite Registration:* Will be available for both In-Person and Virtual attendees from *Saturday, July 26, 2025, through August 1, 2025, at 11:59 PM CET*. Detailed information regarding the registration process can be found on the official conference website: https://acl.swoogo.com/acl2025 We look forward to welcoming you to ACL 2025 in beautiful Vienna! Sincerely, The ACL Organization Team -- Horacio Saggion Full Professor / Chair in Computer Science and Artificial Intelligence Head of the Natural Language Processing Group - TALN Project Coordinator iDEM Project (HE) Co-PI of the AI-BOOST project (HE) Co-PI of the IDEAL project (HE) Universitat Pompeu Fabra https://twitter.com/h_saggion https://www.linkedin.com/in/horacio-saggion-1749b916 -- Horacio Saggion Full Professor / Chair in Computer Science and Artificial Intelligence Head of the Natural Language Processing Group - TALN Project Coordinator iDEM Project (HE) Co-PI of the AI-BOOST project (HE) Co-PI of the IDEAL project (HE) Universitat Pompeu Fabra https://twitter.com/h_saggion https://www.linkedin.com/in/horacio-saggion-1749b916

1 0

Open Positions: NLP Postdocs at MBZUAI
by Teresa Lynn 22 May '25

22 May '25

We have three postdoc position openings at Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi. The project is based on a collaboration with a leading industry partner on the development of a conversational booking agent. * Postdoctoral Research Scientist in Conversational AI & NLP * Postdoctoral Research Scientist in Recommendation & Personalization * Postdoctoral Research Scientist in Persuasive Language Generation More information regarding responsibilities and requirements can be found on our webpage: https://mbzuai-hiring.github.io/ Start date: To be filled immediately, July 2025 - Duration: 1‑year contract with possibility of extension - Location: MBZUAI<https://mbzuai.ac.ae/>, Abu Dhabi, UAE - Apply via e‑mail: NLP.IndustryProject(a)mbzuai.ac.ae We look forward to receiving your application! Regards, Teresa Teresa Lynn , PhD Head of NLP Research Engagement Natural Language Processing P +971 2 811 3284 W.www.mbzuai.ac.ae<https://www.mbzuai.ac.ae/> [mbzuai logo.png] [cid:image002.png@01DBCB09.577C26E0] <https://www.instagram.com/mbzuai> [cid:image003.png@01DBCB09.577C26E0] <https://www.facebook.com/MBZUAI> [cid:image004.png@01DBCB09.577C26E0] <https://www.youtube.com/c/mbzuai> [cid:image005.png@01DBCB09.577C26E0] <https://www.linkedin.com/school/mbzuai/> [cid:image006.jpg@01DBCB09.577C26E0] <https://twitter.com/mbzuai>

1 0

Final call for participation: Automatic Detection of Borrowings shared task (ADoBo 2025)
by ELENA ALVAREZ MELLADO 21 May '25

21 May '25

This is the last call to participate in ADoBo 2025, the shared task on automatic detection of borrowings in Spanish. To gain access to the data, make submissions and check the leaderboard please join the competition at Codabench. Systems submissions will be due on May 26th. https://www.codabench.org/competitions/7284/ TIMELINE April 21: Dev set released. May 12: Test set released May 26: Systems output submissions. June 9: Working notes paper submission. June 16: Notification of acceptance (peer-reviews). June 23: Camera ready paper submission. September: ADoBo results to be presented at IberLEF 2025. ORGANIZATION COMMITTEE Elena Álvarez Mellado, Universidad Nacional de Educación a Distancia (UNED). Julio Gonzalo, Universidad Nacional de Educación a Distancia (UNED). Constantine Lignos, Brandeis University. Jordi Porta Zamorano, Universidad Autónoma de Madrid (UAM). AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente. Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datos<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://sede.uned.es/> de la Universidad. Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.

1 0

First CFP: The Tokenization Workshop@ICML 2025
by Jindrich Libovicky 21 May '25

21 May '25

Tokshop: Tokenization Workshop (ICML 2025) Submission to the Tokenization Workshop begins on April 14, 2025, via OpenReview. The deadline for submissions is May 30, 2025, at 11:59pm (anywhere on earth). Notifications of acceptance will be sent out on June 9, 2025, and camera-ready papers will be due shortly afterward at 11:59pm (anywhere on earth). The workshop will take place on July 18, 2025. Workshop Description The Tokenization Workshop (TokShop) at ICML aims to bring together researchers and practitioners from all corners of machine learning to explore tokenization in its broadest sense. We will discuss innovations, challenges, and future directions for tokenization across diverse data types and modalities. Call for Papers Topics of interest include: - Subword Tokenization in NLP: Analysis of techniques such as BPE, WordPiece, and UnigramLM, as well as improvements for efficiency, interpretability, and adaptability. - Multimodal Tokenization: Tokenization strategies for images, audio, video, and other modalities, including methods to align representations across different types of data. - Multilingual Tokenization: Development of tokenizers that work robustly across languages and scripts, and investigation into failure modes tied to tokenization. - Tokenizer Modification Post-Training: Methods for updating tokenizers after model training to boost performance and/or efficiency without retraining from scratch. - Alternative Input Representations: Exploration of non-traditional tokenization approaches, such as byte-level, pixel-level, or patch-based representations. - Statistical Perspectives on Tokenization: Empirical analysis of token distributions, compression properties, and correlations with model behavior. By broadening the scope of tokenization research beyond language, this workshop seeks to foster cross-disciplinary dialogue and inspire new advances at the intersection of representation learning, data efficiency, and model design. Submission guidelines Our author guidelines follow the ICML requirements unless otherwise specified. - Paper submission is hosted on OpenReview. - Each submission should contain up to 9 pages, not including references or appendix (shorter submissions also welcome). - Please use the provided LaTeX template (Style Files) for your submission. Please follow the paper formatting guidelines general to ICML as specified in the style files. Authors may not modify the style files or use templates designed for other conferences. - The paper should be anonymized and uploaded to OpenReview as a single PDF. - You may use as many pages of references and appendix as you wish, but reviewers are not required to read the appendix. - Posting papers on preprint servers like ArXiv is permitted. - We encourage each submission to discuss the limitations as well as ethical and societal implications of their work, wherever applicable (but neither are required). These sections do not count towards the page limit. - This workshop offers both archival and non-archival options for submissions. Archival papers will be indexed with proceedings, while non-archival submissions will not. - The review process will be double-blind Read more: https://tokenization-workshop.github.io/

1 0

eLex 2025 registration and Hornby bursaries
by Iztok Kosem 21 May '25

21 May '25

(apologies for multiple postings) Dear colleagues, We would like to inform you that the registration for the eLex 2025 conference has now opened (https://elex.link/elex2025/registration/). The deadline for early-bird fee is 5 September 2025. A call for Hornby bursary applications is also out (https://elex.link/elex2025/hornby-bursary/). The bursaries cover participants' registration fee, so if you intend to apply, please wait for results before paying the registration fees (you can still complete all the steps of the registration process and pay later). Finally, the special rates for rooms at the venue and partner hotels are available: https://elex.link/elex2025/venue/. There are a limited number of rooms available so early booking is advisable (there is a very friendly cancellation option). Please monitor the conference website for further updates on the programme, proceedings and related news. Looking forward to seeing you at the conference. Best regards Iztok Kosem Head of the eLex 2025 organising committee

1 0

HealTAC 2025 - call for participation
by Bea Alex 21 May '25

21 May '25

---------------------------- HealTAC 2025 June 16-18th, 2025, Glasgow (UK) https://healtac2025.github.io/ ---------------------------- Call for participation ---------------------------- The 8th Healthcare Text Analytics Conference (HealTAC 2025) invites everyone for three days of state of the art discussions on healthcare text analytics. The programme features -- keynotes on "Addressing the Missing Context Problem in Foundation Models for Healthcare" (by Jason Fries, Stanford University) and "AI for Healthcare: Text as a Medium for Multimodal datasets" (by Alison O'Neil, Canon Medical Research Europe) -- panels on "Opportunities and challenges in LLMs for health research: A multidisciplinary perspective on surfacing social inequalities, bias detection, and mitigation" and "Challenges in AI deployment within NHS"; -- 18 talks describing current PhD projects; -- a workshop on "NLP in mental healthcare and research" (June 16th); -- 4 demos and 9 lightning talks; -- 24 posters presenting healthcare text analytics research. The detailed programme is available at: https://healtac2025.github.io/programme/ ---------------------------- Registration fees ---------------------------- Due to generous support from Health Data Research UK, CogStack, Frontiers, DataMind, University of Glasgow, Research Data Scotland and Healtex, the registration fee is only £100 (for students) and £200 (for everyone else), and includes the full 3-day programme, lunches, the conference dinner and even breakfast on day 1. This is the early registration fee until May 29th. Registration details:https://healtac2025.github.io/registration/ ---------------------------- Accommodation and travel ---------------------------- The University accommodation is available for the registered participants for only £43 per night. All details are available at: https://healtac2025.github.io/accommodation/ Follow the conference announcements on social media at #HEALTAC2025 . We are looking forward to welcoming you to HealTAC 2025. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

1 0

[3rd CFP] The LLMs4Subjects Shared Task -- LLMs meet Extreme Multi-label Classification of Records in Digital Libraries
by D'Souza, Jennifer 21 May '25

21 May '25

The 2nd LLMs4Subjects Shared Task: LLM-based Subject Tagging for the TIB Technical Library's Digital Catalog Theme: The Development of Energy- and Compute-Efficient LLM Systems Organized as part of the German Evaluation (GermEval 2025) Shared Task Series 10. - 12. September, 2025 Hildesheim, Germany (co-located with KONVENS 2025 - Conference on Natural Language Processing) 2nd LLMs4Subjects Shared Task: https://sites.google.com/view/llms4subjects-germeval/ Join the Codabench Competition: https://www.codabench.org/competitions/8373/ KONVENS 2025: https://konvens-2025.hs-hannover.de/about/ Task Overview LLMs4Subjects challenges the research community to develop cutting-edge LLM-based solutions for subject tagging of technical records from Leibniz University's Technical Library (TIBKAT). Participants are tasked with leveraging large language models (LLMs) to tag technical records using the GND taxonomy. The task involves bilingual language modeling, as systems must process technical documents in both German and English. Successful solutions may be integrated into the operational workflows of TIB, the Leibniz Information Centre for Science and Technology. With the rapid advancements in LLMs, the focus is shifting toward making these models more energy- and compute-efficient while maintaining high performance. Recent innovations, such as the DeepSeek series, have demonstrated how techniques like mixture-of-experts (MoE) and model distillation can significantly reduce computational costs without sacrificing effectiveness. The 2nd LLMs4Subjects shared task highlights the importance of efficiency in LLMs, encouraging participants to explore strategies that enhance model performance while optimizing for energy consumption and inference speed. We welcome approaches (but not limited to) that leverage model compression, quantization, efficient fine-tuning, and adaptive computation techniques to push the boundaries of sustainable AI development. Subtasks The 2nd LLMs4Subjects shared task organizes the following two subtasks: Subtask 1 - Multi-Domain Classification of Library Records Subtask 2 - Large-scale Multilabel Subject Indexing of Library Records Important Dates * Release of training data: March 8, 2025 * Release of testing data: May 30, 2025 * Deadline for system submissions: June 2, 2025 * Evaluation end: June 27, 2025 * Paper submission deadline: July 7, 2025 * Notification of acceptance: June 28, 2025 * Camera-ready paper due: August 15, 2025 * Workshop/KONVENS: September 10 - 12, 2025 (TBA) Note: Submit your system outputs on our Codabench live leaderboard at https://www.codabench.org/competitions/8373/

1 0

*SEM2025 - Final Call for Papers
by Kemal Kurniawan 21 May '25

21 May '25

(Apologies for cross-posting) *SEM2025: The 14th Joint Conference on Lexical and Computational Semantics, Suzhou, China. (Co-located with EMNLP) https://starsem2025.github.io/ Third and Final Call for Papers *SEM brings together researchers interested in the semantics of natural languages and its computational modelling. The conference embraces a wide range of approaches including data-driven, neural, probabilistic and symbolic; practical applications as well as theoretical contributions are welcome. The long-term goal of *SEM is to provide a forum for NLP researchers working on any aspect of natural language semantics. *SEM invites submissions related to the computational modelling of natural language semantics (understood broadly) and its application. Relevant areas include (but are not limited to) theoretical aspects of computational semantics, empirical and data-driven approaches, resources, evaluation and applications/tools. *SEM encourages authors to consider ethical aspects of their work, and to address and discuss ethical questions and implications relevant to their research. *SEM also values reproducibility and particularly welcomes submissions that adhere to the reproducibility guidelines as specified here<https://folk.idi.ntnu.no/odderik/reproducibility_guidelines.pdf>. Submission Instructions Submissions must describe unpublished work and be written in English. We solicit both long and short papers. Long papers describe original research and may consist of up to eight (8) pages of content, plus unlimited pages for references. Appendices are allowed after the references, but the paper should be self-contained and reviewers will not be required to check the appendices, if any. Final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account. Short papers describe original focused research and may consist of up to four (4) pages, plus unlimited pages for references. Upon acceptance, short papers will be given five (5) content pages in the proceedings. Authors are encouraged to use this additional page to address reviewers comments in their final versions. Limitations and Ethics Statement sections are allowed and encouraged, but are not mandatory. These sections should be placed after the conclusion and will not count towards the overall page limit. Submissions should follow the ARR formatting requirements<https://github.com/acl-org/acl-style-files>. Submission routes and deadlines *SEM solicits both direct submissions and ACL Rolling Review (ARR) commitments. The deadline for direct submissions is May 30, 2025, and these submissions will be reviewed by the *SEM2025 program committee. ACL Rolling Review (ARR) submissions can be committed to *SEM up to August 22, 2025 (authors of ARR-reviewed papers need to include their OpenReview link with reviews in the submission form). Both types of submissions are made through OpenReview. Direct submission link: https://openreview.net/group?id=aclweb.org/StarSEM/2025/Conference<https://openreview.net/group?id=aclweb.org/StarSEM/2025/Conference> Multiple submission policy: *SEM does not prohibit the submission of work that is under consideration for another venue at the same time as the *SEM review period. However, authors of such papers will be asked to declare this at submission time. Important Dates (All deadlines are 11:59pm UTC-12h, AoE) Direct submission deadline (long & short papers): May 30, 2025 ARR-reviewed submission deadline (long & short papers): August 22, 2025 Notification of acceptance: September 5, 2025 Camera-ready deadline: September 26, 2025 Conference date: TBA (co-located with EMNLP 2025) Following the ACL and ARR policies<https://www.aclweb.org/portal/content/report-acl-committee-anonymity-policy>, there is no anonymity period requirement. Kemal Kurniawan | Research Fellow | (he/him) PhD School of Computing and Information Systems | Faculty of Engineering and IT Level 4, Melbourne Connect, 700 Swanston St The University of Melbourne, Victoria 3010 Australia E: kurniawan.k(a)unimelb.edu.au<mailto:kurniawan.k@unimelb.edu.au>

1 0

Language, Linguistics, and Natural Language Processing - workshop
by Sara Vilar Lluch 20 May '25

20 May '25

Dear All, We are pleased to invite you to the one-day interdisciplinary workshop titled Language, Linguistics, and Natural Language Processing: An Interdisciplinary Approach * Date: 10th June 2025 * Location: Cardiff University (face-to-face; talks will also be made available online) * Registration: the event is free to attend, but registration is required – please book your ticket here<https://www.eventbrite.co.uk/e/language-linguistics-and-natural-language-pr…> <https://www.eventbrite.co.uk/e/language-linguistics-and-natural-language-pr…> The workshop will bring together researchers from language and applied linguistics, communication and NLP/computational sciences to explore opportunities for interdisciplinary research and teaching collaborations that account for the technological changes in our everyday and professional communication practices. We will: * Promote discussion and idea generation for interdisciplinary research collaboration * Identify complementary areas of interest and opportunities for joint projects * Build a shared language to promote cross-disciplinary understanding * Identify avenues for collaborative teaching that enhance postgraduate research and transferable skills We invite early career and senior researchers, as well as postgraduate students, of communication-related disciplines and computer sciences to join us for a day of discussions, networking and knowledge exchange activities. Invited speakers: Dr Alistair Baron<https://www.lancaster.ac.uk/scc/about-us/people/alistair-baron> – Lancaster University – Spelling variation: problems, solutions, and analysis. Dr Emma McClaughlin<https://www.nottingham.ac.uk/english/people/emma.mcclaughlin> – The University of Nottingham – Language in Health Contexts: Interdisciplinary Applications of Corpus Linguistics If you are interested in the intersection of language, communication and digital technologies, we hope to see you there! Best wishes, Sara – on behalf of Dawn Knight, Hui Sun, Carla Pérez-Almendros. Dr Sara Vilar-Lluch Lecturer in Language and Linguistics School of English, Communication and Philosophy Cardiff University John Percival Building Column Drive Cardiff CF10 3EG Email: VilarLluchS(a)cardiff.ac.uk<mailto:VilarLluchS@cardiff.ac.uk> Dr Sara Vilar-Lluch Darlithydd mewn laith ac leithyddiaeth Ysgol Saesneg, Cyfathrebu ac Athroniaeth Prifysgol Caerdydd Adeilad John Percival Rhodfa Colum Cardiff CF10 3EG Ebost: VilarLluchS(a)cardiff.ac.uk<mailto:VilarLluchS@cardiff.ac.uk> [cid:3a725877-f008-4e4a-aff7-67f1844f3510]<https://outlook.office.com/bookwithme/user/5c6b57420d584ae3bb80be4c4baca223…> Book time to meet with me<https://outlook.office.com/bookwithme/user/5c6b57420d584ae3bb80be4c4baca223…>

1 0

[CfP] IEEE FedCSIS 2025: "Digital Humanities, Computational Social Sciences and Economics Research (AI-HuSo)"
by Jens Dörpinghaus 20 May '25

20 May '25

Dear all, As the deadline is fast approaching, I would like to inform you about a call for papers for a thematic track at FedCSIS 2025 (IEEE #61123) called "AI in Digital Humanities, Computational Social Sciences and Economics Research (AI-HuSo)". FedCSIS 2025 will be held in Kraków, Poland, 14-17 September 2025. Paper submission (no extensions): 25.05.2025 See https://2025.fedcsis.org/thematic/ai-huso for details. This thematic session is dedicated to the computational study of Social Sciences, Economics and Humanities, including all subjects like, for example, education, labour market, history, religious studies, theology, cultural heritage, and informative predictions for decision-making and behavioural-science perspectives. While digital methods, intelligence systems, and AI have been emerging topics in these fields for several decades, this thematic session is not only limited to discoveries in these domains, but also dedicated to the reflections of these methods and results within the field of computer science. Thus, we are in particular interested in interdisciplinary exchange and dissemination with a clear focus on computational and AI methods for intelligence systems. Since there is a clear methodological overlap between these three domains and often similar algorithms and AI approaches are considered, we see this thematic session as place for interdisciplinary learning, discussing a joint toolbox as a support for scholars from these field with human and context-aware agents. The aim of this thematic session is thus to bridge the gap between scientific domains, foster interdisciplinary exchange and discuss how research questions from other domains challenge current computer science. In particular, we are interested in communications between researchers from different fields of computer science, social sciences, economics, humanities, and practitioners from different fields. Topics ====== The list of topics includes, but is not limited to: - AI and computational approaches for the interdisciplinary work of the social sciences, economics, and humanities: report on theoretical, methodological, experimental, and applied research. - AI and computational approaches for linking data from different digital resources, including online social networks, web and data mining, Knowledge Graphs, Ontologies. - AI and computational methods for text mining and textual analysis, for example texts within social sciences, digital literary studies, computational stylistics and stylometry. - Text encoding, computational linguistics, annotation guidelines, OCR for humanities, economics, and social sciences. - Network analysis, including social and historical network analysis. - Ethical and philosophical considerations of AI in society, education and humanties research In general, the applications of interest are included in the list below, but are not limited to: - Labour market research and qualification, including behavioral-science perspectives. - Education: Digital methods and systems, e-learning, adult education, etc. - Contributions to the application of technology to culture, history, and societal issues: For example, computational text analysis, analytical and visualization, databases, etc. - In particular, we welcome submissions which focus on a critical reflection of digital methods in the humanities, economics and social sciences within computer science. - Linking of digital resources, a discussion of data sets, their quality and reliability, combining quantitative and qualitative data, anonymization and data protection. Contact: ai-huso(a)fedcsis.org Submission rules ================ - Authors should submit their papers as Postscript, PDF or MSWord files. - The total length of a paper should not exceed 12 pages IEEE style (including tables, figures and references). More pages can be added, for an additional fee. IEEE style templates are available here. - Papers will be refereed and accepted on the basis of their scientific merit and relevance to the Topical Area. - Preprints containing accepted papers will be published online. - Only papers presented at the conference will be published in Conference Proceedings and submitted for inclusion in the IEEE Xplore® database. - Conference proceedings will be published in a volume with ISBN, ISSN and DOI numbers and posted at the conference WWW site. - Conference proceedings will be submitted for indexation according to information here. - Organizers reserve right to move accepted papers between FedCSIS Sessions. Extended versions of selected papers presented at the conference may be published in a volume entitled "Advances in Computational Social Sciences: AI, Computational Methods and Applications for the Study of Society", to be published by Springer. In addition, selected papers may be submitted to a special issue of AI entitled "Integrating Data Sources for Smarter Interdisciplinary AI Solutions: Challenges and Opportunities".

1 0

2026

2025

2024

2023

2022

Corpora May 2025