- Corpora - ELRA lists

RANLP 2025 TUTORIALS (6-7 September): Call for participation
by Petya Osenova 19 Aug '25

19 Aug '25

RANLP 2025 TUTORIALS (6-7 September) Call for Participation Website - https://ranlp.org/ranlp2025/index.php/tutorials/ RANLP 2025 belongs to a sequence of events with similar name and continues the tradition of successful training events that were held in Bulgaria since 1989. RANLP 2025 plans 4 half-day tutorials, each with duration of 185 minutes, distributed as follows: 45 min presentation + 20 min break + 45 min presentation + 30 min coffee break + 45 min presentation. Tutorial Presenters * Burcu Can Buglalilar (University of Sterling, UK) * Salima Lamsiyah (University of Luxembourg, Luxembourg) * Tharindu Ranasinghe and Damith Dola Mullage Premasiri (Lancaster University, UK) * Anna Rogers and Max Müller-Eberstein (IT University of Copenhagen, Denmark) Programme 6th September 2025, 9am Tharindu Ranasinghe and Damith Premasiri: NLP in the LLM era This tutorial examines the transformation of Legal NLP in the era of large language models, beginning with key principles of task formulation and data preparation. We will discuss retrieval and judgment prediction in detail, exploring their methodologies, challenges, and applications in legal contexts. We conclude with a forward-looking discussion on the future of Legal AI and the ethical considerations surrounding its applications in the practice of law. 6th September 2025, 2pm Burcu Can Buglalilar: From Large to Small: Building Affordable Language Models with Limited Resources This tutorial aims to question the limitations and harms of Large Language Models, followed by a comprehensive review of Small Language Models, covering prominent examples, their key techniques, and their capabilities. It will also give an overview of even smaller ‘baby’ language models. Finally, the tutorial will conclude by presenting some recent studies in which we developed baby language models using a very small amount of data. 7th September 2025, 9am Anna Rogers and Max Müller-Eberstein: Studying Generalization in the Age of Contamination The tutorial will discuss the challenges of doing NLP research in the age of LLMs, when we can no longer be sure that the test data was not observed in training. We will cover the main approaches to studying generalization in various settings, and present a new framework for working with controlled test-train splits across linguistically annotated data at scale. 7th September 2025, 2pm Salima Lamsiyah: AI Content in NLP: Trends, Detection, and Applications This tutorial provides a comprehensive overview of AI-generated content in Natural Language Processing (NLP). It covers recent trends in text generation, methods for detecting AI-generated text, and practical applications of such content. The content includes an exploration of state-of-the-art models and techniques for text generation, approaches to identifying machine-generated text, a review of key benchmarks and datasets, and a discussion of open research challenges. We are looking forward to your participation! The organisers of RANLP 2025

1 0

[CfP] ECIR 2026: 2nd call – Workshop Proposals (Sep 12)
by Yifei Yuan 18 Aug '25

18 Aug '25

Call for Workshop Proposals ECIR 2026 (https://ecir2026.eu/) workshops provide a platform for presenting novel ideas and research results in emerging areas in IR in a focused and interactive way. Workshops can be either a half-day (3.5 hours plus breaks) or a full day (7 hours plus breaks). The organizers of approved workshops are expected to set up a webpage for the workshop, disseminate the call for papers and the call for participation, gather and review submissions, and prepare the final program. A camera-ready summary of the workshop, written by the organizers, will be included in the ECIR conference proceedings. Workshops are encouraged to be as dynamic and interactive as possible and should lead to a concrete outcome, such as the publication of workshop proceedings. Organizers are also encouraged to write a summary article for the June edition of the ACM SIGIR Forum, highlighting the main results of the workshop. Workshops are on site, and at least one organizer is expected to attend the workshop. Topics of Interest We welcome submissions on any topic relevant to the general field of Information Retrieval, including those mentioned in the Call for Full papers for ECIR 2026. Submission Guidelines Workshop proposals should contain the following information: Title and abstract of the workshop; Motivation and relevance to ECIR; Workshop goals/objectives and overall vision, coupled with desired outcomes; Format and Structure, in particular, duration of the workshop (full-day or half-day workshop); mention to the type of papers (e.g., full papers, demo papers, negative papers, etc); type of presentation (e.g., oral; poster, etc); and proceedings (e.g., CEUR; Special Issue, etc); planned activities, the tentative schedule of events etc.; resources needed to deliver the workshop (e.g., poster boards, etc); Intended audience, including number of expected participants and how they will be selected/invited; List of organizers with a brief bio highlighting the relevance of their expertise to the workshop topics Names of potential programme committee members, invited speakers, etc Indicate if the workshop is related to or follows on from another workshop; if so, please, identify which conference it was previously held at, the past attendance and outcomes, and why another workshop is needed; Any other relevant information to support your proposal. Workshop proposals should be prepared using Springer proceedings templates available on the Springer webpage, with a maximum length of 8 pages. All proposals must be in English and will be submitted electronically through the conference submission system. Workshop proposals will be reviewed by the ECIR 2026 workshop committee based on the quality of their proposal, covered topics, relationship to ECIR, and likelihood of attracting participants. The ECIR workshop co-chairs will make final decisions. Springer webpage: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu… Submission page: EasyChair submission page: https://easychair.org/conferences/?conf=ecir2026 Ethics and Professional Conduct ECIR 2026 expects authors (as well as the PC, and the organising committee) to adhere to accepted standards on ethics and professionalism in our community, namely: The ACM’s Policy on Authorship, The ACM’s Code of Ethics and Professional Conduct, The ACM’s Conflict of Interest Policy, The ACM’s Policy on Plagiarism, Misrepresentation, and Falsification, The ACM’s Policy Against Harassment Workshop Proposals Track Dates Workshop proposals submission: September 12, 2025, 11:59pm (AoE) Workshop proposals notification: October 17, 2025 Workshop day: April 02, 2026 Workshop Proposals Track Chairs Negar Arabzadeh (UC Berkeley) Franco Maria Nardini (ISTI-CNR, Pisa, Italy) Contact: ecir2026-workshops(a)easychair.org

1 0

[CfP] ECIR 2026: 1st call – Short Papers (Abstract Oct 7)
by yuanyif＠ethz.ch 18 Aug '25

18 Aug '25

Call for Short Papers The European Conference on Information Retrieval (ECIR) is the prime European forum for the presentation of original research in the field of Information Retrieval. The 48th European Conference on Information Retrieval (ECIR 2026) will take place as a physical (in-person) conference from 29 March to 2 April 2026 in Delft, The Netherlands. Topics of Interest The Short Paper Track calls for original contributions presenting novel, thought-provoking ideas and addressing innovative application areas within the field of Information Retrieval, including those mentioned in the Call for Full papers for ECIR 2026 Short papers differ from full papers in that they present innovative new works, but may be narrower in scope or applications. Submissions may include preliminary ideas, but still should provide empirical or theoretical validation. Papers that stimulate discussion are particularly encouraged. • Short papers are up to 6 pages in length, plus additional pages for references. Appendices count toward the page limit. Please put appendices before the references for paper submission. • Short papers will be refereed through double-anonymous peer review. This means that all submitted papers must be fully anonymised. Submission Guidelines Authors should consult Springer's authors' guidelines and use their proceedings templates, either for LaTeX or for Word (to be found at https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu…), for the preparation of their papers. Springer encourages authors to include their ORCIDs in their papers (https://www.springer.com/gp/authors-editors/orcid). All submissions must be written in English. All papers should be submitted electronically through the EasyChair submission system: https://easychair.org/conferences/?conf=ecir2026. In addition, the corresponding author of each accepted paper, acting on behalf of all of the authors of that paper, must complete and sign a Consent-to-Publish form. The corresponding author signing the copyright form should match the corresponding author marked on the paper. Once the paper has been submitted, changes relating to its authorship cannot be made. Accepted papers will be published in the conference proceedings in the Springer Lecture Notes in Computer Science series. The proceedings will be distributed to all delegates at the conference. Accepted papers will have to be presented at the conference by one of the authors in person, and at least one author for each accepted contribution will be required to register and attend. Dual Submission Policy Papers submitted to ECIR 2026 should be substantially different from papers that have been previously published, or accepted for publication, or that are under review at other venues. Exceptions to this rule are: Submission is permitted for papers presented or to be presented at conferences or workshops without proceedings. Submission is permitted for papers that have previously been made available as a technical report (e.g., in institutional archives or preprint archives like arXiv). Please do not cite your technical report and make an effort to avoid any issues that may harm the anonymity of your submission. Reviewers will receive guidance that asks them to refrain from trying to break the anonymity, but be aware that the availability of an available technical report for an ECIR submission might cause some issues. Ethics and Professional Conduct ECIR 2026 expects authors (as well as the PC and the organising committee) to adhere to accepted standards on ethics and professionalism in our community, namely: • The ACM's Policy on Authorship, • The ACM's Code of Ethics and Professional Conduct, • The ACM's Conflict of Interest Policy, • The ACM's Policy on Plagiarism, Misrepresentation, and Falsification, • The ACM's Policy Against Harassment Short Paper Track Dates • Short paper abstract submission: October 7, 2025, 11:59pm (AoE) • Short paper submission: October 14, 2025, 11:59pm (AoE) • Short paper notification: December 16, 2025 (AoE) • Main conference: March 30 – April 1, 2026 Short Paper Track Chairs • Sean McAvaney (University of Glasgow, UK) • Mohammad Aliannejadi (University of Amsterdam, The Netherlands) • Christine Bauer (University of Salzburg, Austria) • Contact: ecir2026-short AT easychair.org

1 0

Final CfP Historical Languages and AI (Berlin, Germany)
by Daidalos-Projekt (Konstantin Schulz) 18 Aug '25

18 Aug '25

Call for Papers: Historical Languages and AI See the online version at https://daidalos-projekt.de/conference/cfp/ . March 5-6, 2026 The intersection of historical languages and artificial intelligence (AI) presents a rich and dynamic field of study, with the potential to revolutionize our understanding of the past and the ways in which we engage with historical texts. As digital technologies continue to advance, the need for interdisciplinary collaboration becomes increasingly apparent. The upcoming 2-day international conference on “Historical Languages and AI” aims to foster this collaboration by bringing together experts from computational literary studies, digital history, linguistics, and other domains that work with historical languages such as Latin. The conference seeks to address the growing demand for innovative methods and tools that can enhance the analysis, preservation, and interpretation of historical languages. By leveraging AI technologies, researchers can unlock new insights into historical texts, improve the accuracy of translations, and develop more effective teaching methods for historical languages. The conference will provide a platform for scholars to share their latest findings, discuss emerging trends, and explore the practical applications of AI in historical language research. It explicitly includes historical stages of modern languages, such as Old English or Early New High German. The conference is hosted by the Daidalos research project (Humboldt University Berlin, 2023-2026; https://daidalos-projekt.de ). The project is building a research infrastructure for methods of natural language processing (NLP). The target group is literary scholars in classical philology and related disciplines. The research infrastructure consists, on the one hand, of an interactive website on which interested parties can apply NLP methods to text corpora. On the other hand, the Daidalos project sees itself as a contact point for interested researchers. In this function, the project regularly invites researchers to workshops (https://daidalos-projekt.de/workshops), advises them within the framework of research tandems (https://daidalos-projekt.de/tandems), and provides materials for further training (https://daidalos-projekt.de/jupyterlite). Conference Dates: March 5-6, 2026 Venue: Humboldt-Universitaet zu Berlin (Berlin, Germany) Unfortunately, we cannot offer travel bursaries. Attending the conference itself is free of charge. Topics of Interest We welcome submissions on a wide range of topics related to historical languages and AI, including but not limited to: Machine Learning Large Language Models / Large Action Models Usage for data modeling or corpus construction Challenges in low-resource scenarios Neural machine translation for historical texts Innovative approaches to historical language analysis Linguistic analysis for literary studies Part-of-speech tagging Topic modeling Sentiment analysis Named entity recognition Word embeddings Multilingual Information Retrieval, incl. cross-lingual embeddings Evaluation of AI-driven methods and datasets Frameworks for mapping research questions to relevant AI models and methods Assessment of AI tools in historical language studies Technical Infrastructure for Research & Teaching Integrating technologies like Jupyter Notebooks into larger software platforms Retrieval-augmented generation for domain-specific chatbots Teaching & Learning Digital Literacies, incl. open educational resources for teaching natural language processing Important Dates Submission Deadline: September 1, 2025 Notification of Acceptance: October 15, 2025 Camera-Ready Submission: January 31, 2026 Conference Dates: March 5-6, 2026 Submission types Included in the open-access proceedings: *Long papers*: up to 4000 words (ca. 8 pages, excl. bibliography and appendix). Long papers report on original and unpublished results. Long papers are presented as oral presentations (30 min talk + 15 min discussion). We welcome the use of appendices or other supplementary information. Published only in the book of abstracts in our Zenodo Community: *Short papers*: up to 2000 words (ca. 4 pages, excl. bibliography and appendix). Short papers report on focused contributions, and may present work in progress. Short papers are presented as short oral presentations (20 min talk + 10 min discussion). We welcome the use of appendices or other supplementary information. *Pitch Your Research Idea*: Submit an abstract of up to 200 words (excl. bibliography and appendix) to give a 5-minute presentation during a pitch session. The presentations are followed by a Scientific Speed Dating Session and enable researchers to get in touch faster.Long papers Workshops (90 min): Submit a proposal for your intended workshop of up to 750 words. Workshops should be organized as hands-on research or learning opportunity. The workshops will take place on the second day of the conference (March 6, 2026). Workshop proposals should describe: the aims and setup of the workshop, the academic background for the work, an outline of the workshop, including the types of activities, the expected key outcomes, a short bio of each organizer or presenter, including their name, affiliation, email address, a plan for promoting the workshop to attract participants, specific requirements, including but not limited to special equipment (e.g., audio/video), software, physical space arrangements, any technical knowledge, skills, or experience participants should have before attending the workshop. Submission Guidelines and Participation All submissions must be in English or German. Papers should be formatted according to the conference template: Template of the Association for Computational Linguistics (https://github.com/acl-org/acl-style-files). It supports both Microsoft Word and LaTeX. Submissions will be peer-reviewed by the organizers. Papers should be submitted as PDF documents via E-Mail: daidalos-projekt(a)hu-berlin.de At least one author of each accepted submission must register to the conference and present the paper. Proceedings of the conference will be published as a Propylaeum eBook in the Digital Classics Books series (for long papers; https://books.ub.uni-heidelberg.de/propylaeum/catalog/series/dcb) and on Zenodo (for all other submissions; https://zenodo.org/communities/daidalos). Hybrid conference: All paper presentations will be broadcast live. Presenters can choose to participate remotely or on-site. On-site attendance is required to participate in the more interactive activities of the conference, e.g. workshops. Contact Information For any inquiries, please contact the conference organizers at daidalos-projekt(a)hu-berlin.de . We look forward to receiving your submissions and welcoming you to the International Conference on Historical Languages and AI! The Conference *Organizing Committee* of the Daidalos project: Andrea Beyer, Konstantin Schulz, Anke Lüdeling, Florian Kotschka, Florian Deichsler, Malte Dreyer

1 0

PhD grant on Analysing Clinical Documents to Support Decision Making Processes in Emergency Departments
by Alberto Lavelli 17 Aug '25

17 Aug '25

*Analysing Clinical Documents to Support Decision Making Processes in Emergency DepartmentsDeadline for application: August 26 2025, 13:00 CEST* One three-year PhD grant on Analysing Clinical Documents to Support Decision Making Processes in Emergency Departments is offered by the Doctoral Program in Brain, Mind & Computer Science (BMCS, http://hit.psy.unipd.it/BMCS) at the University of Padua, jointly with the Natural Language Processing research unit (https://nlplab.fbk.eu/) at Fondazione Bruno Kessler (Trento, Italy), where most of the research activities will be conducted. The language of the PhD programme is English. The deadline for application is: August 26 2025, 13:00 CEST For more information, the call, and applications look at: http://hit.psy.unipd.it/BMCS/admission The candidate will have the unique opportunity to explore different fields (Natural Language Processing, Machine Learning, Health & Well-Being) being directly coached by very experienced teammates. The involved PhD will work in an international environment at Fondazione Bruno Kessler (Trento, Italy). This PhD grant intends to exploit the capacity of Large Language Models (LLMs) to interpret the content of clinical documents produced in Emergency Departments (EDs) of hospitals in order to improve service quality for patients. The final goal of the project is to advance into the integration of generative AI models into healthcare, improving their alignment with the clinical expertise and the processes in EDs. The major context of the PhD will be the Horizon project eCREAM ( ecreamproject.eu/), where, through active scientific protocols, several EDs of different EU countries are involved. On the one hand, the project will take advantage of LLMs for automatic filling of Case Report Forms from anonymized clinical notes in several languages. On the other hand, the reasoning capacities of LLMs will then be applied to the extracted information to derive statistical analysis that helps decision makers for better process efficiency. The adoption of LLMs in the clinical field raises a number of research challenges, which will be addressed during the PhD. Such challenges include improving accuracy of performance, interpretability of decisions in classification tasks, coherence of reasoning capacity, mitigating the existence of biases, and risks related to data security. Fondazione Bruno Kessler is an internationally well-known research center, whose information technology department ranks first among the Engineering and Information Science research centers in Italy. The Natural Language Processing research unit (https://nlplab.fbk.eu/) is an internationally well known research group focused on text mining (information extraction and ontology population from text, analysis of the sentiment and of the emotional content of texts); conversational agents (task oriented dialogue systems, question answering, generation of persuasive messages); and development of linguistic resources, particularly for the Italian language. To get in contact with the NLP research unit and discuss about the opportunities of this call, contact Bernardo Magnini (magnini(a)fbk.eu) The Doctoral Program in Brain, Mind & Computer Science (BMCS) emerges from the close collaboration between faculty from psychology, cognitive neuroscience and information science around the unifying topic of human-computer interaction. Its program rests on the assumption that the ability to work in groups with people of different background is now a fundamental condition to produce scientific excellence and to develop innovative skills that can be spent on the job market. ****Required/Preferred Candidate Skills and Competencies**** The candidate should possess basic knowledge on Natural Language Processing and Machine Learning techniques (particularly deep learning architectures and large language models). Experience on biomedical/clinical data will be a plus. Basic programming skills (e.g. Python) would complete the profile. Proficiency in English is required, basic knowledge of Italian preferable. ****Instructions for applicants**** Interested applicants are invited to apply following the instructions given in https://pica.cineca.it/unipd/dottorati41luglio by August 26 2025, 13:00 CEST For further information, please contact: Bernardo Magnini (magnini(a)fbk.eu) -- -- Le informazioni contenute nella presente comunicazione sono di natura privata e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai destinatari indicati e per le finalità strettamente legate al relativo contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente. -- The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you received this in error, please contact the sender and delete the material.

1 0

Free Lancaster webinar 19 August 2pm UK time
by Brezina, Vaclav 16 Aug '25

16 Aug '25

Dear all, Please join us for a free Lancaster webinar on a very current topic (critical reflection on current developments and challenges in the use of AI): <https://x.com/danagablas/status/1955682624455819296> [??] Navigating Challenges in the use of AI and GenAI in Applied Linguistics by Prof Tony McEnery 19 August 2025 | 2-3pm (UK time) https://forms.office.com/e/uppRBrE5AF<https://t.co/cTo7mg8AVj> Abstract: In the webinar, we will focus on the current developments in the use of GenAI and AI in applied linguistics. In particular, Prof Tony McEnery will explore the impact of AI on applied linguistics, reflecting on the alignment of contemporary AI research with the epistemological, ontological, and ethical traditions of applied linguistics. The talk will discuss the potential affordances of AI and GenAI for applied linguistics as well as some of the challenges that we face when employing AI and GenAI as part of applied linguistics research processes. The goal of this talk is to attempt to align perspectives in these disparate fields and forge a fruitful way ahead for further critical interrogation and integration of AI and GenAI into applied linguistics. Best, Vaclav Professor Vaclav Brezina Professor of Corpus Linguistics Co-Director of the ESRC Centre for Corpus Approaches to Social Science Lancaster, LA1 4YD Office: County South, room C05 T: +44 (0)1524 510828 @vaclavbrezina [cid:ccadda37-dda8-4733-be3d-82f902ed9aea]<http://www.lancaster.ac.uk/arts-and-social-sciences/about-us/people/vaclav-…>

1 0

August 2025 Newsletter - LDC
by Penn LDC 15 Aug '25

15 Aug '25

In this newsletter: LDC at Interspeech 2025 Fall 2025 LDC data scholarship program New publications: Mixer 6 - ChiME 8 Transcribed Calls and Interviews<https://catalog.ldc.upenn.edu/LDC2025S07> Abstract Meaning Representation 2.0 - Machine Translations<https://catalog.ldc.upenn.edu/LDC2025T10> KAIROS Phase 1 Quizlet<https://catalog.ldc.upenn.edu/LDC2025T11> ________________________________ LDC at Interspeech 2025 LDC will be exhibiting at Interspeech 2025<https://www.interspeech2025.org/>, held this year August 17-21 in Rotterdam, the Netherlands. Stop by our booth to say hello and learn about the latest developments at the Consortium. Also be on the lookout for the following presentations, posters, and special sessions featuring LDC work: Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis Monday, August 18, 11:00-13:00 - Area5-Oral1 - Speech Analysis, Detection and Classification 1 Reasoning-Based Approach with Chain-of-Thought for Alzheimer's Detection Using Speech and Large Language Models Tuesday, August 19, 13:30-15:30 - Area1-Poster2B - Databases and Progress in Methodology Special Session: Challenges in Speech Collection, Curation and Annotation<https://sites.google.com/view/speech-data-cca-is25/> Wednesday, August 20, 13:30-15:30 - Area14-SS7 - Part 1 Wednesday, August 20, 16:00-18:00 - Area14-SS8 - Part 2 TELVID: A Multilingual Multi-modal Corpus for Speaker Recognition Thursday, August 21, 13:30-15:30 - AREA4-Oral8 - Speaker Recognition LDC also supported the Interspeech 2025 URGENT Challenge<https://urgent-challenge.github.io/urgent2025/> which aims to bring more attention to constructing Universal, Robust, and Generalizable speech EnhancemeNT models. LDC will post conference updates via our social media platforms. We look forward to seeing you in Rotterdam! Fall 2025 LDC data scholarship program Student applications for the Fall 2025 LDC data scholarship program are being accepted now through September 15, 2025. This program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor. For application requirements and program rules, visit the LDC Data Scholarships page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>. ________________________________ New publications: Mixer 6 - CHiME 8 Transcribed Calls and Interviews<https://catalog.ldc.upenn.edu/LDC2025S07> was developed for the 7th and 8th CHiME (Computational Hearing in Multisource Environments)<https://www.chimechallenge.org/> challenges. It contains 80 hours of English interviews and telephone speech from Mixer 6 Speech (LDC2013S03)<https://catalog.ldc.upenn.edu/LDC2013S03> with transcripts developed for the CHiME challenges divided into training, development, and test sets. This data was used in CHiME 7 Task 1<https://www.chimechallenge.org/challenges/chime7/task1/index> and CHiME 8 Task 1<https://www.chimechallenge.org/challenges/chime8/task1/>, both of which focused on transcription and segmentation across varied recording conditions such as interviews, meetings, and dinner parties, with an emphasis on generalization across recording device types and array topologies. The data includes audio from Mixer 6 Speech recorded on 13 microphones for a total of 1063 hours (corresponding to 80 hours of speech). The development and test sets are speaker-disjoint from the training data and consist of fully transcribed, multi-microphone interviews. Each transcript segment was labeled with the speaker, the uttered text, and the start and end times in seconds for that segment. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * Abstract Meaning Representation 2.0 - Machine Translations<https://catalog.ldc.upenn.edu/LDC2025T10> was developed at the University of Edinburgh, School of Informatics <https://www.ed.ac.uk/informatics> and the University of Zurich,<https://www.uzh.ch/en.html> Department of Computational Linguistics<https://www.cl.uzh.ch/en.html>. It consists of Spanish, German, Italian, and Mandarin Chinese automatic translations of the source English and professionally-translated Spanish, German, Italian, and Mandarin Chinese sentences in Abstract Meaning Representation 2.0 - Four Translations (LDC2020T07)<https://catalog.ldc.upenn.edu/LDC2020T07>. The translations were collected through Google Translate between May 2018 and March 2024. The source English sentences are a subset (1,371 sentences) of the sentences contained in Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10)<https://catalog.ldc.upenn.edu/LDC2017T10>, a semantic treebank of over 39,000 English natural language sentences from broadcast conversations, newswire, and web text. Translations were from each of the five languages (English, Spanish, German, Italian, and Mandarin Chinese) to the other four languages (Spanish, German, Italian, and Mandarin Chinese) covering 20 language pairs. The dataset contains 1371 source sentences in each language, each with a professionally translated source sentence and multiple dated translations by Google Translate. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * KARIOS Phase 1 Quizlet<https://catalog.ldc.upenn.edu/LDC2025T11> was developed by LDC and contains English and Spanish text, video, and image data and annotations used for pre-evaluation research and system development during Phase 1 of the DARPA KAIROS program. KAIROS Quizlets were a series of narrowly defined tasks designed to explore specific evaluation objectives enabling KAIROS system developers to exercise individual system components on a small data set prior to the full program evaluation. This corpus contains the complete set of Quizlet data used in Phase 1 which focused on two real-world complex events (CEs) within the Improvised Explosive Device bombing scenario: CE1001 (2018 Caracas drone attack) and CE1002 (Utah High School backpack bombing). Source data was collected from the web; 30 root web pages were collected and processed, yielding 29 text data files, 216 image files and 5 video files. Annotation steps included labeling scenario-relevant events and relations for each document to develop a structured representation of temporally ordered events, relations, and arguments and generating a reference knowledge graph. The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions, and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

RANLP 2025: The 3rd Summer School on Deep Learning and Large Language Models for NLP - Call for Participation
by Petya Osenova 15 Aug '25

15 Aug '25

The 3rd Summer School on Deep Learning and Large Language Models for NLP Call for Participation Website - https://ranlp2025-summer-school.github.io/ We invite everyone interested in Machine Learning and Natural Language Processing to attend the 3rd Summer School on Deep Learning and Large Language Models (LLMs) for Natural Language Processing (NLP), which will be held from September 3–5, 2025, in Varna, Bulgaria, as part of RANLP 2025. Building on the success of the 1st and 2nd RANLP summer schools in 2019 and 2023 respectively, RANLP 2025 summer school will explore a broad spectrum of NLP topics with a special emphasis on LLMs. Each day will feature morning lectures that focus on theoretical foundations, followed by afternoon lab sessions dedicated to hands-on implementation and experimentation. Participants will also have the opportunity to compete in a competition, with awards presented to the top performers. The summer school will feature talks by leading researchers in NLP and deep learning from both academia and industry. Summer School Lecturers * Dr Salima Lamsiyah (Luxembourg University, Luxembourg) * Dr Burcu Can Buglalilar (University of Sterling, UK) * Dr Hansi Hettiarachchi (Lancaster University, UK) * Dr Andrei Mikheev (Daxtra Technologies, UK) * Dr Max Müller-Eberstein (IT University of Copenhagen, Denmark) * Dr Tharindu Ranasinghe (Lancaster University, UK) Summer School Tutors * Maram Alharbi (Lancaster University, UK) * Isuri Nanomi Arachchige (Lancaster University, UK) * Salmane Chafik (Mohammed VI Polytechnic University, Morocco) * Ernesto Luis Estevanell (University of Alicante, Spain) * Alexander Mikheev (Daxtra Technologies, UK) * Damith Dola Mullage Premasiri (Lancaster University, UK) Programme Day 1 NLP/ DL Foundation Day 2 LLM Foundation Day 3 LLM Applications 09:00 – 10:30 Introduction to NLP and Deep Learning Dr Tharindu Ranasinghe Introduction to LLMs Dr Burcu Can Training a Danish LLM: Lessons Learned Dr Max Müller-Eberstein 10:30 – 11:00 Coffee/ Tea Break 11:00 – 12:30 Language Models and Beyond Dr Hansi Hettiarachchi Evaluating and Benchmarking LLMs Dr Salima Lamsiyah LLMs in Recruitment Sector (Tentative) Dr Andrei Mikheev and Alexander Mikheev 12:30 – 2:00 Lunch 2:00 – 3:00 Practical Session I: Word embeddings and Deep Learning in NLP Practical Session III: Prompting and finetuning LLMs Practical Session V: Implementing LLMs in the Legal Domain 3:00 – 3:30 Introducing the Summer School Competition, Your Teams and Mentors 3:30 – 4:00 Coffee/ Tea Break 4:00 – 5:30 Practical Session II: Transformers in NLP Practical Session IV: Tools for LLMs: LangGraph Practical Session VI: LLMs for Code Generation 5:30 – 6:00 Open Session with Mentors and Teams Open Session with Mentors and Teams Awards and Closing We are looking forward to your participation! The organisers of RANLP 2025 Summer School

1 0

EACL/ACL 2026 call for workshops
by z＠zeerak.org 13 Aug '25

13 Aug '25

[Apologies for cross-posting] Dear colleagues, It’s time again for the … Joint Call for Workshops Proposals (EACL/ACL) 2026 The Association for Computational Linguistics, the European Language Resource Association and International Committee on Computational Linguistics invite proposals for workshops to be held in conjunction with EACL 2026 or ACL 2026. We solicit proposals in all areas of computational linguistics, broadly conceived to include related disciplines such as linguistics, speech, information retrieval, and multimodal processing. Workshops will be held at one of the following conference venues: * EACL 2026 (The 19th Conference of the European Chapter of the Association for Computational Linguistics) which will be held as a hybrid conference, and physically held in Rabat, Morocco, from March 24-29, 2026 * ACL 2026 (The 64th Annual Meeting of the Association for Computational Linguistics), which will be held as a hybrid conference, and physically held in San Diego, California, from July 2-7, 2026 The workshop and tutorial co-chairs will work together to assign workshops to the conferences. They will take into account location preferences and technical constraints provided by the workshop proposers. A second call will be made in the fall for workshops colocated with conferences later in the year (e.g., EMNLP and AACL). This call thus exclusively centres EACL and ACL 2026. Important Dates EACL/ACL 2026 shared dates: Proposal submission deadline September 5, 2025 Notification of acceptance September 22, 2025 All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”). Submission Information Proposals should be submitted as PDF documents. Note that submissions should be ready to be turned into a Call for Papers to the workshop within one week of notification. The proposals should be at most two pages for the main proposal and at most two additional pages for information about the organizers, program committee, and references. Thus, the whole proposal should not be more than four pages long. Please use the LaTeX template<https://www.overleaf.com/read/ytktkdvzgshk> for your submission. The two pages for the main proposal must include: * A title, short name / acronym, and a brief description of the workshop topic and content. * Some conferences might take place only or partially virtually. We request submissions to contain a brief discussion on measures planned to make sure a workshop is successful and productive in case of a hybrid or virtual-only attendance. * A description of special requirements and technical needs. * A description of any limitations that would restrict the workshop to a specific venue (EACL or ACL). For example: if the workshop is compatible with only one of these events, logistically, thematically or otherwise, or if the workshop cannot be held at a venue for logistical reasons. * Diversity and Inclusion Efforts (see more details below) * If the workshop has been held before, a note specifying: how many prior editions occurred, where previous workshops were held, how many submissions the workshop received in the last iteration and how many papers were accepted (also specify if they were not regular papers, e.g., shared task system description papers), and an estimate of how many in-person posters the workshop attracted. * (Optional/If Known) A list of invited speakers, with an indication of which ones have already agreed and which are tentative, and sources of funding for the speakers. * (Optional) A description of any shared tasks associated with the workshop, and estimate of the number of participants. Having a shared task is optional. The submission form will request information that does not factor into the decision process, but are necessary for logistical reasons: * An estimate of the maximum number of attendees at one given time * Number of estimated in-person posters * Preferred Venue (first and second preference). Providing a second preferred venue is optional, and we assume that providing a second preference indicates its compatibility for the workshop. While we will do our best to adhere to these preferences, we cannot guarantee that they will be satisfied. * Duration of the workshop (1-day / 2-day workshop) Note that the only financial support available to workshops is a single free workshop registration for an invited speaker. The workshop organizers must bear all other costs independently, including registration for more than one invited speaker. The two pages for information about organizers, program committee, and references must include: * The names, affiliations, and email addresses of the organizers, with a brief statement (2-5 sentences) of their research interests, areas of expertise, and experience in organizing workshops and related events. * A list of Program Committee members, with an indication of which members have already agreed. Organizers should do their best to estimate the number of submissions (especially for recurring workshops) in order to (a) ensure a sufficient number of reviewers so that each paper receives 3 reviews, and (b) anticipate that no one is committed to reviewing more than 3 papers. This practice is likely to ensure on-time and thoughtful reviews. * An indication whether the workshop will consider papers submitted through ACL Rolling Review (ARR); use OpenReview as a platform (both to take papers from ARR and for their own review); or whether the workshop will only use START as a platform, and will not use ARR. In making this choice, please pay careful attention to the ARR deadlines and conference notifications. * References Submission is electronic at the following link: https://softconf.com/p/acl-workshops2026/track/ACL_EACL Diversity and Inclusion The proposals should describe the ways in which the workshop will support diversity in NLP. We suggest organizers consider the following points, while developing the proposal: * Contribution to academic diversity: The proposals could explain how the subject matter of the workshop will contribute to the diversity of the field, e.g. use of multilingual data, indications of how the described methods scale up to various languages or domains, accessibility of resources, supporting underrepresented communities of NLP and so on. * Diversifying representation: Following the WiNLP<http://www.winlp.org/winlp-2020-workshop/> initiative, we recognize the current problems of demographic imbalance in the field. Therefore, we particularly encourage submissions including members of under-represented groups in computational linguistics. The proposals should describe how their selection of invited speakers, panelists, organizers, and program committee promotes diverse representation (for example, considering underrepresented demographics based on gender, ethnicity, nationality, and so on). We also suggest including speakers and panelists, who have not appeared as a keynote speaker or panelist in recent conferences. * Diversifying participation: The proposals could describe how the call-for-papers and outreach will encourage people from marginalized groups to attend and submit to the workshop. Some examples include providing mentoring, subsidies, coordinating with affinity groups, diversifying the selection of papers and so on. Organizer Responsibilities The organizers of the accepted proposals will be responsible for publicizing and running the workshop, including reviewing submissions, producing the camera-ready workshop proceedings, organizing the meeting days, and playing their part to ensure that all participants are aware of ACL’s anti-harassment policy. It is crucial that organizers commit to all deadlines. In particular, failure to produce the camera-ready proceedings on time will lead to the exclusion of the workshop from the unified proceedings and author indexes. Workshop organizers cannot accept submissions for publication that will be (or have been) published elsewhere, although they are free to set their own policies on simultaneous submission and review. However, it is worth noting that workshops may also accept non-archival submissions, such as findings papers, for presentation, which are allowed in this case. Since the conferences will occur at different times, the timelines for the submission and reviewing of workshop papers, and the preparation of camera-ready copies, will be different for each conference. Suggested timelines for each of the conferences are given below. The workshop organizers are free to deviate from the proposed schedule for all dates that are not marked as inflexible, though changes should be made in consultation with the relevant workshop chairs. In submitting a proposal, workshop chairs will be asked to agree to the workshop non-compliance policy<https://docs.google.com/document/u/0/d/1hhocb0fXBBJhqJHoOfZtx1V1kcTATpA4IMU…>. All workshops must agree to this policy, which states that egregious cases of not living up to the responsibilities of running a workshop will be penalized by a 1-year ban on the organizers from submitting another workshop proposal. Workshop proposals for which all authors do not agree to this policy will be desk-rejected. The ACL has a set of policies on workshops. You can find the ACL’s general policies on workshops, the financial policy for workshops, and the financial policy for SIG workshops in the Conference Handbook<http://aclweb.org/adminwiki/index.php?title=Conference_Handbook>. Review Process Workshop proposals will be holistically reviewed by a committee of workshop chairs and the ACL workshop officers based on: their originality and impact, the experience of the Organizing and Program Committees, and the ethical considerations presented and adherence of the workshop proposal to ACL’s code of ethics. This committee will also allocate workshops to the conferences included in the call, taking into account location preferences and technical constraints given in the workshop proposal. However, the aim of the review process is to accept as many high-quality workshops as possible. Given space limitations at conference venues and the increasing number of workshop proposals, the review committee can not guarantee that a proposal will be co-located with their preferred venue in lieu of extenuating circumstances. The review process will have three possible outcomes: accept, in which case the workshop will be co-located with either EACL or ACL; revise and resubmit, where the organizing committee is encouraged to incorporate reviewer feedback and resubmit to the next call for workshops for AACL and EMNLP; or reject, in which the workshop proposal should not be submitted the next call, and will be desk rejected if submitted. Tentative Workshop Timelines EACL First call for workshop papers October 15, 2025 Second call for workshop papers November 12, 2025 Third call for papers December 5, 2025 Direct Submission deadline December 19, 2025 Pre-reviewed (ARR) submission deadline January 2, 2026 Notification of acceptance January 23, 2026 Camera-ready paper due February 3, 2026 Proceedings due (hard deadline) February 24, 2026 Pre-recorded video due (hard deadline) February 27, 2026 Workshop dates March 24-29, 2026 ACL First call for workshop papers December 10, 2025 Second call for workshop papers January 15, 2026 Third call for workshop papers February 20, 2026 Direct paper submission deadline March 5, 2026 Pre-reviewed ARR commitment deadline March 24, 2026 Notification of acceptance April 28, 2026 Camera-ready paper due May 12, 2026 Proceedings due (hard deadline) June 1, 2026 Pre-recorded video due (hard deadline) June 4, 2026 Workshop dates July 2-3, 2026 Workshop Chairs EACL * Adriana Pagano * Emmanuele Chersoni * Julia Ive ACL * Loic Barrault, Meta FAIR * Yang Zhao, Chinese Academy of Sciences Contact e-mail: star-acl-workshops(a)googlegroups.com

1 0

Third call for papers KNLP@ACM-SAC
by Patrizio Bellan 13 Aug '25

13 Aug '25

*Knowledge and Natural Language Processing Track @ ACM-SAC* Aim of the Knowledge and Natural Language Processing (KNLP) track at ACM SAC is to investigate techniques and application of knowledge engineering and natural language processing, focusing in particular on approaches combining them. This is an extremely interdisciplinary emerging research area, at the core of Artificial Intelligence, combining and complementing the scientific results from Natural Language Processing and Knowledge Representation and Reasoning. Topics of interest Topics of interest include, but are not limited to: - Natural Language Processing - NLP tasks for Knowledge Extraction - NLP for Ontology Population and Learning - Sentiment Analysis and Opinion Mining for Knowledge Applications - Interplay between Language and Ontologies - NLP for Explainable Knowledge - Machine Translation techniques for Multilingual Knowledge - NLP for the Web - Bias detection and mitigation in small/large LM - (Small/Large) LM and Knowledge - Knowledge - Knowledge to improve NLP tasks - Knowledge for Information Retrieval - Knowledge-based Sentiment Analysis and Opinion Mining - Combining Knowledge and Deep Learning for NLP - Knowledge for Text Summarization and Generation - Knowledge for Persuasion - Knowledge-based Machine Translation - Knowledge for the Web - Linked Data for NLP - Knowledge-based NL Explainability - LM-enhanced ontology and knowledge engineering methodologies and tools - LM-based agent for knowledge extraction, reasoning, and management - Ontology evaluation via small/large LMs - (Ontological) knowledge memorization in LMs - Knowledge-based techniques for LMs (Retrieval Augmented Generation based approaches, fact-checking, and bias mitigation) - Question answering over knowledge graphs via small/large LMs - Real-world applications that exploit Knowledge and NLP - Real-world applications that exploit Knowledge and NLP - Knowledge and NLP Systems for Big Data scenarios - Knowledge and NLP technology for a diverse, equitable, and inclusive society - Deployment of Knowledge and NLP Systems in specific domains, such as: - Digital Humanities and Social Sciences - eGovernment and public administration - Life sciences, health, and medicine - News and Data Streaming Paper Submission Submissions must not have been published or be concurrently considered for publication elsewhere. Papers should be submitted in PDF using the ACM-SAC proceedings format <https://www.sigapp.org/sac/sac2026/authorkit.php>. Authors' names and affiliations should be entered separately at the submission site and not appear in the submitted papers. Each submission will be reviewed in *a DOUBLE-BLIND *process according to the ACM-SAC Regulations. Student Research Competition (SRC) submissions are welcome (see SAC 2026 SRC page for details <https://www.sigapp.org/sac/sac2026/src_program.php>). Initial Submission Policy - All submissions must initially be submitted as regular papers. There is no separate submission track for poster papers. - Paper selection is based on originality, technical contribution, presentation quality, and relevance to the Knowledge and Natural Language Processing Track. - Based on the outcome of the review process, some submissions—although technically sound—may not be accepted as regular papers due to overall acceptance rate constraints, and could be accepted as posters Minimum Length for Review Consideration - While there is no formal minimum page requirement, submissions of fewer than four (4) full pages that do not demonstrate substantial contributions may be subject to desk rejection without external review. Camera-ready Page Limits - Regular Papers (accepted for publication): - Up to eight (8) pages are included with standard registration. Poster Papers (recommended for acceptance): - Up to two (2) pages are included with standard registration. *Important Dates (check SAC website <https://www.sigapp.org/sac/sac2026/#important-dates> for up-to-date dates)* *September 26, 2025: Regular Paper & SRC Abstract Submission* For further information, please visit the Knowledge and Natural Language Processing Track <https://knlp.fbk.eu/> and ACM-SAC 2026 <https://www.sigapp.org/sac/sac2026/> conference websites or feel free to contact the Track Co-Chairs <knlp(a)fbk.eu>. -- -- Le informazioni contenute nella presente comunicazione sono di natura privata e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai destinatari indicati e per le finalità strettamente legate al relativo contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente. -- The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you received this in error, please contact the sender and delete the material.

1 0

2025

2024

2023

2022