- Corpora - ELRA lists

Save the date: LREC 2026, 11-16 May 2026, Palma de Mallorca (Spain)
by info＠elda.org 16 May '25

16 May '25

[Apologies for multiple postings] ELRA and the LREC 2026 Organisers are happy to announce that the 15th edition of the Language Resources and Evaluation Conference will be held in Palau de Congressos de Palma, Palma de Mallorca (Spain), on 11-16 May 2026. The 3-day main conference will take place on 13-15 May 2026, and it will be accompanied by 3 days of workshops and tutorials to be held in the days immediately before (11-12 May 2026) and after (16 May 2026). The hybrid conference will bring together researchers and practitioners in natural language processing, computational linguistics, speech and multimodality, with special attention to evaluation and the development of resources that support work in these areas. Following the tradition of LREC, the 15th edition will feature grand challenges and provide ample opportunity for participants to exchange information and ideas through both oral presentations and extensive poster sessions, complemented by an exciting social program. The LREC 2026 Organisers

1 0

[CfP] Call for Research & Innovation Papers at SEMANTiCS 2025 - (Extended Deadline: May 20)
by Kossi Amouzouvi 16 May '25

16 May '25

Call for Research & Innovation Papers SEMANTiCS 2025 EU 21st International Conference on Semantic Systems Vienna, Austria September 3 - 5, 2025 Follow us on *Twitter/X* <https://x.com/SemanticsConf>, *LinkedIn* <https://www.linkedin.com/groups/7496190/?highlightedUpdateUrn=urn%3Ali%3Agr…>, and *Bluesky*. <https://bsky.app/profile/semantics-conf.bsky.social> *The submission deadlines for the Research & Innovation Papers have been extended as follows:* - *Abstract Submission Deadline: May 20, 2025* - *Paper Submission Deadline: May 27, 2025 * - *Notification of Acceptance: June 30, 2025 * - *Camera-Ready Paper Deadline: July 18, 2025* *All deadlines are set for 11:59 pm, Anywhere On Earth time (UTC-12)* *Submissions will be through Easychair and the submission link will be provided soon.* Proceedings of SEMANTiCS 2025 EU will be made available *open access*. Research and Innovation Track The SEMANTiCS 2025 conference is excited to invite submissions for the Research and Innovation Track, welcoming groundbreaking research contributions, innovative solutions, and experimental studies relevant to the Semantic Web, Semantic Technologies, and AI-enabled semantics. We also encourage submissions at the intersections of these fields with other scientific and applied disciplines, fostering cross-disciplinary exchange and advancement. Papers should present original work that has not been published or is not under consideration elsewhere. All submissions must adhere to the submission guidelines, including reference formatting and any additional documentation as required. Each submission will undergo a rigorous review process, with at least three independent reviews, evaluating the novelty, technical quality, reproducibility, and practical relevance of the work. Topics of Interest SEMANTiCS 2025 calls for submissions of high-quality research papers across a broad spectrum of topics in Semantic Web, Semantic Technologies, and AI. We are particularly interested in new and emerging trends, especially where semantic technologies intersect with evolving fields such as large language models, explainable AI, and trustworthy data infrastructures. Topics of interest include, but are not limited to: - Web Semantics & Linked (Open) Data - Enterprise Knowledge Graphs, Graph Data Management - Machine Learning Techniques for/using Knowledge Graphs (e.g. reinforcement learning, deep learning, data mining and knowledge discovery) - Generative AI and Knowledge Graphs (e.g., Retrieval-Augmented Generation (RAG) with knowledge graph integration, generative model grounding) - Reasoning, Rules, and Policies on RAG - Knowledge Engineering and Management (e.g., knowledge acquisition, extraction, integration, and publication workflows) - Terminology, Thesaurus & Ontology Management, Ontology engineering - Web agents - Natural Language Processing for/using Knowledge Graphs (e.g. entity linking and resolution using target knowledge such as Wikidata and DBpedia, foundation models) - Crowdsourcing for/using Knowledge Graphs - Data Quality Management and Assurance - Mathematical and Logical Foundations of Knowledge-aware AI - Multimodal Knowledge Graphs (e.g., text, image, audio fusion in graph structures) - Semantic-Enhanced Data Science Pipelines and Processes - Semantics in Blockchain environments (e.g., traceability, decentralized knowledge representation) - Trust, Data Privacy, and Security with Semantic Technologies - Internet of Things (IoT), Stream Processing, and Temporal Data Management (e.g., real-time semantic processing and predictive analytics) - Conversational AI and Dialogue Systems powered by Knowledge Graphs - Provenance and Data Change Tracking (e.g., semantic versioning, data updates in distributed settings) - Semantic Interoperability (e.g., cross-domain standards, mapping frameworks, ontology alignment) - Linked Data storage, triple stores, graph databases - Robust, Scalable, and Fault-Tolerant Semantic Data Systems (e.g., distributed querying, optimization) - User Interfaces and Usability of Semantic Technologies (e.g., visualizations, intelligent user interaction) - Explainable and Interoperable AI - Decentralised and Federated Knowledge Graphs (e.g., federated querying, link traversal) Applied Semantic Technologies and AI in Real-World Scenarios, such as, but not limited to: - Biomedicine and Health (e.g., Knowledge Graphs for biomedical applications, AI-driven diagnostics, personalized health) - AI for Environmental and Climate Solutions (e.g., semantic modeling for environmental impact, biodiversity knowledge graphs) - Scientific Knowledge Graphs and Open Science (e.g., FAIR data principles, enhanced scholarly communication) - Semantic Technologies in GLAM (Galleries, Libraries, Archives, and Museums) - Knowledge Graphs and Hybrid AI for Industry 4.0/5.0 and Predictive Maintenance - Digital Humanities and Cultural Heritage Preservation - Legal Technology, AI Ethics, and Regulatory Compliance (e.g., AI and legal frameworks, semantic-enabled compliance with the EU AI Act) - Economics and Governance of Data Ecosystems (e.g., data marketplaces, semantic service interoperability, data policy) Submission Guidelines The Research and Innovation Track at SEMANTiCS 2025 invites both *long* and *short paper submissions*. - *Long papers* should be *12-15 pages* in length (excluding references). These submissions are expected to present comprehensive, mature research findings, including in-depth theoretical or practical insights. - *Short papers* should be a *maximum of 6 pages* (excluding references). These submissions can include preliminary findings, innovative ideas, or position papers that aim to spark discussion and exploration. References are not included in the page count, so authors may add additional pages for relevant citations if needed. This flexibility allows authors to fully reference foundational and related work to strengthen the context and impact of their research. - Submissions should follow the guidelines of IOS Press. Details are available at *https://www.iospress.com/book-article-instructions*. <https://www.iospress.com/book-article-instructions> - Authors need to use the *Word template* <https://www.iospress.com/sites/default/files/media/files/2022-06/ECRC-Autho…> or *LaTeX* <https://vtex-soft.github.io/texsupport.IOS-Book-Article/> template provided by IOS Press. Overleaf users can copy the project *from here* <https://www.overleaf.com/read/gkkspcvjgwxv#563836> (follow instructions in the abstract). - Abstract submission is mandatory for all papers. To aid the review and bidding process, we highly encourage authors to submit structured abstracts. - All papers and abstracts have to be submitted electronically via EasyChair. - Submissions must be in English. - Submissions must adhere to the fair use of Large Language Models. Please refer to the SEMANTiCS *full policy* <https://2025-eu.semantics.cc/page/llm-policy> for more details. - Submissions must be anonymous; the reviewing process is double-blind, but reviewers will be able to disclose their identities if they wish, by signing their reviews. - Accepted papers will be published in open access proceedings by IOS Press, and the text of all the reviews (excluding the scores) of all the accepted papers will be posted on the conference website and will be archived on Zenodo as publicly available material. - At least one author of each accepted paper must present it in person and therefore register for the conference at the ONSITE rate. - All authors are strongly suggested to provide optional links to code, materials, and datasets during the submission process - we will have specific optional fields in the EasyChair submission form - the review process will take these into account when provided. To anonymise resources for the reviewing process, authors can use services like *Anonymous GitHub* <https://anonymous.4open.science/> or figshare/Zenodo as described *here* <https://github.com/dgraziotin/disclose-data-dbr-first-then-opendata?tab=rea…>. - The Research and Innovation Track will not accept papers that, at the time of submission, are under review or have already been published in or accepted for publication in a journal or another conference. - All authors will have the opportunity to provide an ORKG comparison in the Open Research Knowledge Graph (*https://orkg.org* <https://orkg.org>) during the submission process - we will have a specific optional field in the EasyChair submission form. Review and Evaluation Criteria Each submission will be reviewed by at least three Programme Committee members. The reviewing process is double-blind. However, reviewers can disclose their identity by signing their reviews and/or adding one of their persistent identifiers (e.g. their ORCID). The text of all the reviews (excluding the scores) of all the accepted papers will be posted on the conference website with the basic bibliographic metadata of the reviewed submission (i.e. title and authors), and it will be archived on Zenodo as publicly available material. All the signed reviews of the accepted papers will be licensed using a Creative Commons Attribution license (CC-BY, the copyright holder will be the reviewer), except the anonymous ones that will be released in CC0. Papers submitted to this track will be evaluated according to the following criteria: - Appropriateness - Originality, novelty, and innovativeness - Impact of results - Technical quality of the methods - Soundness of the evaluation - Proper comparison to related work - Clarity and quality of writing - Reproducibility of results and resources *We look forward to receiving your contributions!* Research and Innovation Track Chairs Blerina Spahiu (University of Milano-Bicocca, IT) Mehdi Ali (Lamarr Institute & Fraunhofer IAIS, Germany) Kind Regards, On behalf of the organising committee. ========================= Dr. Kossi Amouzouvi ScaDS.AI Dresden/Leipzig, TU Dresden -- DISCLAIMER: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email by mistake, please notify the sender immediately and you are herewith notified that the contents are legally privileged and that you do not have permission to disclose the contents to anyone, make copies thereof, retain or distribute or act upon it by any means, electronically, digitally or in print. The views expressed in this communication may be of a personal nature and not be representative of AIMS-NEI and/or any of its Centres or Initiatives.

1 0

Free online FOM@PLAY Erasmus+ Summer School 2025 3-4 June 2025
by PASCUAL FRANCISCO PEREZ PAREDES 16 May '25

16 May '25

Join the FOM@PLAY Erasmus+ Summer School 2025 – Exploring Freedom of Movement through Language and Experience: 3 – 4 June 2025 This free event offers a valuable opportunity to reflect on the meaning of migration and freedom of movement in the European Union today, through the lens of the real experiences of EU citizens living and working across borders. This summer school is a central event of the FOM@PLAY Erasmus+ project<http://www.um.es/fomatplay>, which investigates mobility through the rights, aspirations, and futures of European citizens. A key outcome of this project is a multilingual corpus—in Spanish, French, and Italian—that gathers personal accounts of mobility across Europe. These narratives serve as a rich foundation for exploring how corpus linguistics can illuminate societal challenges, particularly in relation to migration, identity, and EU citizenship. The event brings together students, citizens, and researchers to reflect on the role of language and discourse in shaping understandings of mobility, belonging, and fundamental rights within the EU. Program Highlights: * Testimonies from EU citizens who have experienced cross-border mobility, including the challenges posed by Brexit. * Dialogue spaces, interactive workshops, and creative sessions designed to spark critical thinking and collaborative exploration. * A focus on how language and personal narratives reveal tensions, hopes, and possibilities within the EU’s freedom of movement framework. Early registration is recommended as places are limited: 🔗 Register for online participation: https://casiopea.um.es/cursospe/fom@playerasmus+i1.f Click “Matrícula” Participants will receive an official certificate of attendance. This activity is organized with the support of the Office of Continuing Education at the University of Murcia. If you’re a university lecturer, we’d appreciate you sharing this invitation with students who may find it relevant to their studies or personal interests. More details can be found here: 🔗 https://www.um.es/fomatplay/?page_id=2001 Warm regards, The FOM@PLAY Team University of Murcia Pascual Pérez-Paredes https://webs.um.es/pascualf

1 0

2nd Call for Papers - NLP4Sustain 2025 - Submissions open
by Jakob Prange 16 May '25

16 May '25

Call for Papers: NLP for Sustainability (NLP4Sustain) Workshop 2025 With this workshop, we want to provide an interdisciplinary forum for discussing research, progress, and challenges in the context of NLP and sustainability. We invite submissions about NLP-based analysis of sustainability-related texts, sustainable NLP models and evaluation practices in general. Authors and other participants will engage with each other in a poster session and there will be an interdisciplinary invited talk with an ensuing discussion. We invite technical, survey, and position papers, as long (8 page) or short (4 page) papers (plus references and appendices) written in English and formatted according to the ACL stylesheet. Relevant Topics • analyses and classifications of sustainability-related texts (such as company reports, advertisements, legal texts, …)• generation of explanations, critiques, summaries, … of sustainability-related texts• multimodal models related to sustainability, such as language-vision or climate-impact models• question answering in the sustainability/climate context• sustainable (e.g. small, efficient) NLP models for other applications/domains• sentiment analysis in the sustainability/climate context• media and social media analysis with NLP methods in the sustainability/climate context Important Dates * Tue, 10.06.2025: Paper submission deadline * Fri, 25.07.2025: Acceptance notification * Fri, 15.08.2025: Camera-ready due * Wed, 10.09.2025: Workshop @KONVENS in Hildesheim, Germany (at least one author must be present) Cross-submission and previous reviews If you have reviews from a previous submission cycle (e.g. ARR), you can submit them together with your paper. In this case we may not assign new reviewers but make an acceptance decision based on the reviews you submitted. We recommend including explanations of how you addressed the reviewers' comments. We also invite presentations (without publication) of papers accepted and published elsewhere. This must be indicated at the time of submission (non-archival) and must be allowed by the other venue. If you want to present a paper that will be published elsewhere, you must submit the reviews from that other venue. If you want to submit a paper that was rejected elsewhere for publication at NLP4Sustain, you can optionally submit the reviews from the other venue. The submission page is now open: https://openreview.net/group?id=KONVENS/2025/Workshop/NLP4Sustain For further details, visit our website: https://nlp4sustain.github.io/ If you have any questions, please contact: jakob.prange(a)uni-a.de and/or c.jakob(a)tu-berlin.de -- Dr. Jakob Prange (er/he) Akademischer Rat auf Zeit / Research Associate Chair for Natural Language Understanding & Digital Humanities (Prof. Friedrich) Faculty for Applied Informatics, University of Augsburg https://jakpra.github.io/

1 0

RANLP Student Research Workshop 2025
by Ivelina Nikolova 16 May '25

16 May '25

First Call for Papers --------------------- RANLPStud 2025 [1] Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing (RANLP 2025 [2]) 8-10 September 2025 Varna, Bulgaria Further to the previous successful and highly competitive Student Research Workshops associated with the conference 'Recent Advances in Natural Language Processing' (RANLP, in 2009, 2011, 2013, 2015, 2017, 2019, 2021 and 2023), we are pleased to announce the ninth edition of the workshop which will be held during the main RANLP 2025 conference days on 8-10 September 2025. The conference and the workshop will take place again at the Black Sea city of Varna, Bulgaria. The International Conference RANLP 2025 [3] would like to invite students at all levels (Bachelor-, Master-, and PhD-students) to present their ongoing or completed work at the Student Research Workshop. We invite two types of student submissions: * Full Papers - unpublished original research of the student. * Short Papers - either a work in progress or a research proposal. The aim of this workshop is to facilitate the exchange of knowledge between young researchers by providing an excellent opportunity to present and discuss their work in progress or completed projects to an international research audience and receive feedback from senior researchers. SUBMISSIONS We invite two types of student submissions: * Full Papers must describe original unpublished work of the student in any topic area of the workshop. Full papers are limited to 8 pages for content, with 2 additional pages for references. * Short Papers may describe either work in progress or a research proposal. They may also be in the style of a position paper that surveys and criticizes existing literature. Short papers must include clear directions for future research. Submissions of this type are limited to 6 pages for content, with 2 additional pages for references. All papers must be submitted in .pdf format through the START system. The papers should follow the format of the main conference, described at the RANLP website [3], All papers must have only student authors. Submissions with non-student authors will not be considered for review. After eventual acceptance of the paper, the authors could add their supervisor(s) in the Acknowledgments Section. The submissions must specify the student's level (Bachelor-, Master-, or PhD) and the type of submission (Full or Short). Double submission Authors may submit the same paper at several conferences. In this case, they must notify the organizers by filling in the corresponding information in the submission form, as well as notifying the contact organizer by email. TOPICS OF INTEREST The aim of this workshop is to facilitate the exchange of knowledge between young researchers by providing an excellent opportunity to present and discuss their work and to receive mentorship and valuable feedback from an international research community. The research to be presented can come from any topic within Natural Language Processing (NLP) and Computational Linguistics, including but not limited to the following: * phonetics, phonology, * morphology; * syntax, semantics, discourse, pragmatics, dialogue, lexicon; * complexity; * mathematical, statistical, machine learning and deep learning models; * language resources and corpora; * crowdsourcing for creation of linguistic resources; * electronic dictionaries, terminologies and ontologies; * sublanguages and controlled languages; * linked data; * POS tagging; * parsing; * semantic role labelling; * word-sense disambiguation; * multiword expressions and computational phraseology; * textual entailment; * anaphora resolution; * temporal processing; * language generation; * speech recognition; * text-to-speech synthesis; * multilingual NLP; * machine translation, translation memory systems and computer-aided translation tools, text simplification and readability estimation; * knowledge acquisition; * information retrieval; * text categorisation; * information extraction; * text summarisation; * terminology extraction; * question answering; * opinion mining and sentiment analysis; * fact checking and fake news; * stance recognition; * hate speech and aggression detection; * author profiling; * dialogue systems; * chatbots and conversational agents; * irony and sarcasm detection; * negation and speculation detection; * computer-aided language learning; * multimodal systems; * language and vision; * NLP for biomedical texts; * NLP for educational applications; * NLP for healthcare; * NLP for financial purposes; * NLP for legal texts; * for the Semantic web; * theoretical and application-orientated papers related to NLP. All accepted papers will be presented at the Student Workshop sessions (oral or poster) during the main conference days: 8-10 September 2025. The articles will be issued in a special Student Session proceedings associated with RANLP and uploaded to the ACL Anthology. IMPORTANT DATES Submission deadline: 05 July 2025 Acceptance notification: 06 August 2025 Camera-ready deadline: 20 August 2025 Workshop: 8 - 10 September 2025 All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth") ORGANISING COMMITTEE Boris Velichkov (Faculty of Mathematics and Informatics and SUMMIT Project, Sofia University "St. Kliment Ohridski", Bulgaria) Ivelina Nikolova-Koleva (Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, and Graphwise, Bulgaria) Milena Slavcheva (Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Bulgaria) Contacts: 2025-stud(a)ranlp.org -------------- Ivelina Nikolova-Koleva, PhD Assistant professor at the Artificial Intelligence and Language Technologies Dep., Institute of Information and Communication Technologies, Bulgarian Academy of Sciences & Graphwise Links: ------ [1] https://sites.google.com/view/ranlp-stud-2025/ [2] https://ranlp.org/ranlp2025 [3] http://ranlp.org/ranlp2025/

1 0

[CfP] SEMANTiCS 2025: Call for Research & Innovation Papers (Extended Deadline)
by Beyza Yaman 16 May '25

16 May '25

Call for Research & Innovation Papers SEMANTiCS 2025 EU 21st International Conference on Semantic Systems Vienna, Austria September 3 - 5, 2025 Important Dates: - Abstract Submission Deadline: May 16, 2025 May 20, 2025 - Paper Submission Deadline: May 23, 2025 May 27, 2025 - Notification of Acceptance: June, 27, 2025 June 30, 2025 - Camera-Ready Paper Deadline: July 15, 2025 July 18, 2025 All deadlines are set for 11:59 pm, Anywhere On Earth time (UTC-12) Submissions will be through Easychair and the submission link will be provided soon. Proceedings of SEMANTiCS 2025 EU will be made available open access. Research and Innovation Track The SEMANTiCS 2025 conference is excited to invite submissions for the Research and Innovation Track, welcoming groundbreaking research contributions, innovative solutions, and experimental studies relevant to the Semantic Web, Semantic Technologies, and AI-enabled semantics. We also encourage submissions at the intersections of these fields with other scientific and applied disciplines, fostering cross-disciplinary exchange and advancement. Papers should present original work that has not been published or is not under consideration elsewhere. All submissions must adhere to the submission guidelines, including reference formatting and any additional documentation as required. Each submission will undergo a rigorous review process, with at least three independent reviews, evaluating the novelty, technical quality, reproducibility, and practical relevance of the work. Topics of Interest SEMANTiCS 2025 calls for submissions of high-quality research papers across a broad spectrum of topics in Semantic Web, Semantic Technologies, and AI. We are particularly interested in new and emerging trends, especially where semantic technologies intersect with evolving fields such as large language models, explainable AI, and trustworthy data infrastructures. Topics of interest include, but are not limited to: - Web Semantics & Linked (Open) Data - Enterprise Knowledge Graphs, Graph Data Management - Machine Learning Techniques for/using Knowledge Graphs (e.g. reinforcement learning, deep learning, data mining and knowledge discovery) - Generative AI and Knowledge Graphs (e.g., Retrieval-Augmented Generation (RAG) with knowledge graph integration, generative model grounding) - Reasoning, Rules, and Policies on RAG - Knowledge Engineering and Management (e.g., knowledge acquisition, extraction, integration, and publication workflows) - Terminology, Thesaurus & Ontology Management, Ontology engineering - Web agents - Natural Language Processing for/using Knowledge Graphs (e.g. entity linking and resolution using target knowledge such as Wikidata and DBpedia, foundation models) - Crowdsourcing for/using Knowledge Graphs - Data Quality Management and Assurance - Mathematical and Logical Foundations of Knowledge-aware AI - Multimodal Knowledge Graphs (e.g., text, image, audio fusion in graph structures) - Semantic-Enhanced Data Science Pipelines and Processes - Semantics in Blockchain environments (e.g., traceability, decentralized knowledge representation) - Trust, Data Privacy, and Security with Semantic Technologies - Internet of Things (IoT), Stream Processing, and Temporal Data Management (e.g., real-time semantic processing and predictive analytics) - Conversational AI and Dialogue Systems powered by Knowledge Graphs - Provenance and Data Change Tracking (e.g., semantic versioning, data updates in distributed settings) - Semantic Interoperability (e.g., cross-domain standards, mapping frameworks, ontology alignment) - Linked Data storage, triple stores, graph databases - Robust, Scalable, and Fault-Tolerant Semantic Data Systems (e.g., distributed querying, optimization) - User Interfaces and Usability of Semantic Technologies (e.g., visualizations, intelligent user interaction) - Explainable and Interoperable AI - Decentralised and Federated Knowledge Graphs (e.g., federated querying, link traversal) Applied Semantic Technologies and AI in Real-World Scenarios, such as, but not limited to: - Biomedicine and Health (e.g., Knowledge Graphs for biomedical applications, AI-driven diagnostics, personalized health) - AI for Environmental and Climate Solutions (e.g., semantic modeling for environmental impact, biodiversity knowledge graphs) - Scientific Knowledge Graphs and Open Science (e.g., FAIR data principles, enhanced scholarly communication) - Semantic Technologies in GLAM (Galleries, Libraries, Archives, and Museums) - Knowledge Graphs and Hybrid AI for Industry 4.0/5.0 and Predictive Maintenance - Digital Humanities and Cultural Heritage Preservation - Legal Technology, AI Ethics, and Regulatory Compliance (e.g., AI and legal frameworks, semantic-enabled compliance with the EU AI Act) - Economics and Governance of Data Ecosystems (e.g., data marketplaces, semantic service interoperability, data policy) Submission Guidelines The Research and Innovation Track at SEMANTiCS 2025 invites both long and short paper submissions. - Long papers should be 12-15 pages in length (excluding references). These submissions are expected to present comprehensive, mature research findings, including in-depth theoretical or practical insights. - Short papers should be a maximum of 6 pages (excluding references). These submissions can include preliminary findings, innovative ideas, or position papers that aim to spark discussion and exploration. References are not included in the page count, so authors may add additional pages for relevant citations if needed. This flexibility allows authors to fully reference foundational and related work to strengthen the context and impact of their research. - Submissions should follow the guidelines of IOS Press. Details are available at https://www.iospress.com/book-article-instructions. - Authors need to use the Word template <https://www.iospress.com/sites/default/files/media/files/2022-06/ECRC-Autho…> or LaTeX <https://vtex-soft.github.io/texsupport.IOS-Book-Article/> template provided by IOS Press. Overleaf users can copy the project from here <https://www.overleaf.com/read/gkkspcvjgwxv#563836> (follow instructions in the abstract). - Abstract submission is mandatory for all papers. To aid the review and bidding process, we highly encourage authors to submit structured abstracts. - All papers and abstracts have to be submitted electronically via EasyChair. - Submissions must be in English. - Submissions must adhere to the fair use of Large Language Models. Please refer to the SEMANTiCS full policy <https://2025-eu.semantics.cc/page/llm-policy> for more details. - Submissions must be anonymous; the reviewing process is double-blind, but reviewers will be able to disclose their identities if they wish, by signing their reviews. - Accepted papers will be published in open access proceedings by IOS Press, and the text of all the reviews (excluding the scores) of all the accepted papers will be posted on the conference website and will be archived on Zenodo <https://zenodo.org/> as publicly available material. - At least one author of each accepted paper must present it in person and therefore register for the conference at the ONSITE rate. - All authors are strongly suggested to provide optional links to code, materials, and datasets during the submission process - we will have specific optional fields in the EasyChair submission form - the review process will take these into account when provided. To anonymise resources for the reviewing process, authors can use services like Anonymous GitHub <https://anonymous.4open.science/> or figshare/Zenodo as described here <https://github.com/dgraziotin/disclose-data-dbr-first-then-opendata?tab=rea…> . - The Research and Innovation Track will not accept papers that, at the time of submission, are under review or have already been published in or accepted for publication in a journal or another conference. - All authors will have the opportunity to provide an ORKG comparison in the Open Research Knowledge Graph (https://orkg.org) during the submission process - we will have a specific optional field in the EasyChair submission form. Review and Evaluation Criteria Each submission will be reviewed by at least three Programme Committee members. The reviewing process is double-blind. However, reviewers can disclose their identity by signing their reviews and/or adding one of their persistent identifiers (e.g. their ORCID). The text of all the reviews (excluding the scores) of all the accepted papers will be posted on the conference website with the basic bibliographic metadata of the reviewed submission (i.e. title and authors), and it will be archived on Zenodo as publicly available material. All the signed reviews of the accepted papers will be licensed using a Creative Commons Attribution license (CC-BY, the copyright holder will be the reviewer), except the anonymous ones that will be released in CC0. Papers submitted to this track will be evaluated according to the following criteria: - Appropriateness - Originality, novelty, and innovativeness - Impact of results - Technical quality of the methods - Soundness of the evaluation - Proper comparison to related work - Clarity and quality of writing - Reproducibility of results and resources We look forward to receiving your contributions! Research and Innovation Track Chairs Blerina Spahiu (University of Milano-Bicocca, IT) Mehdi Ali (Lamarr Institute & Fraunhofer IAIS, Germany) -------------------------------------------------------------------------------- On behalf of the Organising Committee, Dr. Beyza Yaman, Trinity College Dublin, Ireland

1 0

Universal Dependencies, release 2.16
by Dan Zeman 15 May '25

15 May '25

We are very happy to announce the twenty-second release of annotated treebanks in Universal Dependencies, v2.16, available at https://universaldependencies.org/. Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective (de Marneffe et al., 2021; Nivre et al., 2020). The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). The general philosophy is to provide a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages, while allowing language-specific extensions when necessary. The *319* treebanks in v2.16 are annotated according to version 2 of the UD guidelines and represent the following *179* languages: Abaza, Abkhaz, Afrikaans, Akkadian, Akuntsu, Albanian, Alemannic, Amharic, Ancient Greek, Ancient Hebrew, Apurina, Arabic, Armenian, Assyrian, Azerbaijani, Bambara, Basque, Bavarian, Beja, Belarusian, Bengali, Bhojpuri, Bokota, Bororo, Breton, Bulgarian, Buryat, Cantonese, Cappadocian, Catalan, Cebuano, Chinese, Chukchi, Classical Armenian, Classical Chinese, Coptic, Croatian, Czech, Danish, Dutch, Egyptian, English, Erzya, Esperanto, Estonian, Faroese, Finnish, French, Frisian Dutch, Galician, Georgian, German, Gheg, Gothic, Greek, Guajajara, Guarani, Gujarati, Gwichin, Haitian Creole, Hausa, Hebrew, Highland Puebla Nahuatl, Hindi, Hittite, Hungarian, Icelandic, Ika, Indonesian, Irish, Italian, Japanese, Javanese, Kaapor, Kangri, Karelian, Karo, Kazakh, Khoekhoe, Khunsari, Kiche, Komi Permyak, Komi Zyrian, Korean, Kurmanji, Kyrgyz, Latgalian, Latin, Latvian, Ligurian, Lithuanian, Livvi, Low Saxon, Luxembourgish, Macedonian, Madi, Maghrebi Arabic French, Makurap, Malayalam, Maltese, Manx, Marathi, Mbya Guarani, Middle French, Moksha, Munduruku, Naga, Naija, Nayini, Neapolitan, Nenets, Nheengatu, North Sami, Northwest Gbaya, Norwegian, Occitan, Odia, Old Church Slavonic, Old East Slavic, Old English, Old French, Old Irish, Old Turkish, Ottoman Turkish, Pashto, Paumari, Persian, Pesh, Phrygian, Polish, Pomak, Portuguese, Romanian, Russian, Sanskrit, Scottish Gaelic, Serbian, Sindhi, Sinhala, Skolt Sami, Slovak, Slovenian, Soi, South Levantine Arabic, Spanish, Spanish Sign Language, Swedish, Swedish Sign Language, Tagalog, Tamil, Tatar, Teko, Telugu, Telugu English, Thai, Tswana, Tupinamba, Turkish, Turkish English, Turkish German, Ukrainian, Umbrian, Upper Sorbian, Urdu, Uyghur, Uzbek, Veps, Vietnamese, Warlpiri, Welsh, Western Armenian, Western Sierra Puebla Nahuatl, Wolof, Xavante, Xibe, Yakut, Yoruba, Yupik and Zaar. The 179 languages belong to *35* families: Afro-Asiatic, Arawakan, Arawan, Austro-Asiatic, Austronesian, Basque, Bororoan, Chibchan, Chukotko-Kamchatkan, Code switching, Constructed, Creole, Dravidian, Eskimo-Aleut, Indo-European, Japanese, Kartvelian, Khoe-Kwadi, Korean, Macro-Je, Mande, Mayan, Mongolic, Na-Dene, Niger-Congo, Northwest Caucasian, Pama-Nyungan, Sign Language, Sino-Tibetan, Tai-Kadai, Tungusic, Tupian, Turkic, Uralic and Uto-Aztecan. Depending on the language, the treebanks range in size from less than 1,000 tokens to over 3 million tokens. We expect the next release to be available in November 2025. The size of the following 48 treebanks changed significantly since the last release: Abkhaz AbNC : 6363 → 9652 Alemannic UZH : 0 → 1444 Ancient Hebrew PTNK : 39036 → 90770 Azerbaijani TueCL : 663 → 912 Bokota ChibErgIS : 0 → 2713 Bororo BDT : 6993 → 160356 Classical Armenian CAVaL : 88009 → 99663 Coptic Bohairic : 0 → 32724 Czech PDT : 1506486 → 0 Czech PDTC : 0 → 3440052 Egyptian UJaen : 14650 → 21927 English CHILDES : 0 → 226470 English GUM : 211920 → 233926 English LinES : 94217 → 106305 Esperanto Cairo : 0 → 177 Esperanto Prago : 0 → 839 French ALTS : 0 → 43832 Georgian GNC : 0 → 18747 Greek Cretan : 0 → 4351 Greek Lesbian : 0 → 3333 Haitian Creole Adolphe : 0 → 71734 Ika ChibErgIS : 0 → 3706 Khoekhoe KDT : 0 → 29007 Korean KSL : 66989 → 108072 Korean LittlePrince : 0 → 13656 Kyrgyz TueCL : 1001 → 1250 Latin CIRCSE : 18968 → 24899 Middle French PROFITEROLE: 12025 → 68454 Naga Suansu : 0 → 3123 Neapolitan RB : 10 → 199 Nenets Tundra : 0 → 651 Nheengatu CompLin : 19278 → 21813 Northwest Gbaya Autogramm: 2417 → 2693 Occitan CorAG : 0 → 37585 Occitan TTB : 0 → 25619 Odia ODTB : 0 → 1029 Old English Cairo : 0 → 171 Ottoman Turkish DUDU : 813 → 10287 Pashto Sikaram : 995 → 2515 Pesh ChibErgIS : 2508 → 4275 Russian Taiga : 197001 → 1758939 Sindhi Isra : 0 → 15741 Swedish LinES : 90961 → 102538 Turkish English BUTR : 0 → 393 Turkish TueCL : 0 → 904 Ukrainian ParlaMint : 51997 → 84189 Uzbek TueCL : 0 → 939 Xavante XDT : 1740 → 2234 In total, the new release contains *2,263,318* sentences, 36437487 surface tokens and *37,158,675* syntactic words. Daniel Zeman, Joakim Nivre, Mitchell Abrams, Elia Ackermann, Jephtey Adolphe, Noëmi Aepli, Hamid Aghaei, Željko Agić, Amir Ahmadi, Lars Ahrenberg, Chika Kennedy Ajede, Arofat Akhundjanova, Furkan Akkurt, Gabrielė Aleksandravičiūtė, Ika Alfina, Avner Algom, Khalid Alnajjar, Chiara Alzetta, Antonios Anastasopoulos, Erik Andersen, Matthew Andrews, Lene Antonsen, Tatsuya Aoyama, Katya Aplonova, Angelina Aquino, Carolina Aragon, Glyd Aranes, Maria Jesus Aranzabe, Bilge Nas Arıcan, Þórunn Arnardóttir, Gashaw Arutie, Jessica Naraiswari Arwidarasti, Masayuki Asahara, Katla Ásgeirsdóttir, Deniz Baran Aslan, Cengiz Asmazoğlu, Luma Ateyah, Furkan Atmaca, Mohammed Attia, Aitziber Atutxa, Liesbeth Augustinus, Mariana Avelãs, Elena Badmaeva, Jana Bajorat, Keerthana Balasubramani, Miguel Ballesteros, Esha Banerjee, Sebastian Bank, Bryan Khelven da Silva Barbosa, Verginica Barbu Mititelu, Starkaður Barkarson, Rodolfo Basile, Victoria Basmov, Colin Batchelor, John Bauer, Seyyit Talha Bedir, Shabnam Behzad, Juan Belieni, Alevtina Bémová, Kepa Bengoetxea, İbrahim Benli, Yifat Ben Moshe, Marie Benzerrak, Ansu Berg, Gözde Berk, Riyaz Ahmad Bhat, Erica Biagetti, Eckhard Bick, Agnė Bielinskienė, Esma Fatıma Bilgin Taşdemir, Helin Binici, Kristín Bjarnadóttir, Verena Blaschke, Rogier Blokland, Nina Böbel, Victoria Bobicev, Loïc Boizou, Stavros Bompolas, Johnatan Bonilla, Emanuel Borges Völker, Carl Börstell, Cristina Bosco, Gosse Bouma, Sam Bowman, Adriane Boyd, Anouck Braggaar, António Branco, Myriam Bras, Kristina Brokaitė, Lanni Bu, Eva Buráňová, Aljoscha Burchardt, Carmen Cabeza, Natalia Cáceres Arandia, Marisa Campos, Marie Candito, Bernard Caron, Gauthier Caron, Catarina Carvalheiro, Rita Carvalho, Lauren Cassidy, Maria Clara Castro, Sérgio Castro, Tatiana Cavalcanti, Gülşen Cebiroğlu Eryiğit, Flavio Massimiliano Cecchini, Giuseppe G. A. Celano, Anila Çepani, Slavomír Čéplö, Neslihan Cesur, Savas Cetin, Özlem Çetinoğlu, Fabricio Chalub, Liyanage Chamila, Claudine Chamoreau, Shweta Chauhan, Yifei Chen, Ethan Chi, Taishi Chika, Yongseok Cho, Jinho Choi, Bermet Chontaeva, Jayeol Chun, Juyeon Chung, Alessandra T. Cignarella, Silvie Cinková, Aurélie Collomb, Çağrı Çöltekin, Miriam Connor, Claudia Corbetta, Daniela Corbetta, Francisco Costa, Marine Courtin, Benoît Crabbé, Mihaela Cristescu, Vladimir Cvetkoski, Netanel Dahan, Ingerid Løyning Dale, Philemon Daniel, Khensa Daoudi, Bijayalaxmi Dash, Satya Ranjan Dash, Elizabeth Davidson, Leonel Figueiredo de Alencar, Mathieu Dehouck, Martina de Laurentiis, Marie-Catherine de Marneffe, Ahmet Demir, Valeria de Paiva, Mehmet Oguz Derin, Elvis de Souza, Arantza Diaz de Ilarraza, Roberto Antonio Díaz Hernández, Carly Dickerson, Ariani Di Felippo, Arawinda Dinakaramani, Elisa Di Nuovo, Bamba Dione, Peter Dirix, Hoa Do, Kaja Dobrovoljc, Caroline Döhmer, Adrian Doyle, Timothy Dozat, Kira Droganova, Magali Sanches Duran, Puneet Dwivedi, Christian Ebert, Hanne Eckhoff, Masaki Eguchi, Sandra Eiche, Roald Eiselen, Marhaba Eli, Ali Elkahky, Binyam Ephrem, Olga Erina, Tomaž Erjavec, Louise Esher, Soudabeh Eslami, Farah Essaidi, Aline Etienne, Wograine Evelyn, Sidney Facundes, Richárd Farkas, Ján Faryad, Federica Favero, Jannatul Ferdaousi, Marília Fernanda, Hector Fernandez Alcalde, Amal Fethi, Jennifer Foster, Barbara Francioni, Theodorus Fransen, Cláudia Freitas, Kazunori Fujita, Katarína Gajdošová, Daniel Galbraith, Edith Galy, Federica Gamba, Marcos Garcia, José María García-Miguel, Moa Gärdenfors, Tanja Gaustad, Efe Eren Genç, Fabrício Ferraz Gerardi, Kim Gerdes, Luke Gessler, Filip Ginter, Gustavo Godoy, Iakes Goenaga, Koldo Gojenola, Memduh Gökırmak, Yoav Goldberg, Gili Goldin, Xavier Gómez Guinovart, Berta González Saavedra, Bernadeta Griciūtė, Matias Grioni, Loïc Grobol, Normunds Grūzītis, Bruno Guillaume, Kirian Guiller, Céline Guillot-Barbance, Tunga Güngör, Vladimir Gurevich, Nizar Habash, Hinrik Hafsteinsson, Michael Hahn, Jan Hajič, Jan Hajič jr., Eva Hajičová, Mika Hämäläinen, Linh Hà Mỹ, Na-Rae Han, Muhammad Yudistira Hanifmuti, Takahiro Harada, Sam Hardwick, Kim Harris, Naïma Hassert, Dag Haug, Jiří Havelka, Johannes Heinecke, Oliver Hellwig, Felix Hennig, Barbora Hladká, Jaroslava Hlaváčová, Florinel Hociung, Diana Hoefels, Petter Hohle, Nick Howell, Yidi Huang, Marivel Huerta Mendez, Jena Hwang, Takumi Ikeda, Inessa Iliadou, Anton Karl Ingason, Radu Ion, Elena Irimia, Ọlájídé Ishola, Artan Islamaj, Kaoru Ito, Federica Iurescia, Jessica K. Ivani, Sandra Jagodzińska, Siratun Jannat, Tomáš Jelínek, Apoorva Jha, Katharine Jiang, Sylvanus Job, Mayank Jobanputra, Anders Johannsen, Hildur Jónsdóttir, Fredrik Jørgensen, Zhuoxuan Ju, Markus Juutinen, Hüner Kaşıkara, Nadezhda Kabaeva, Sylvain Kahane, Hiroshi Kanayama, Jenna Kanerva, Neslihan Kara, Ritván Karahóǧa, Jiří Kárník, Andre Kåsen, Tolga Kayadelen, Sarveswaran Kengatharaiyer, Václava Kettnerová, Lilit Kharatyan, Jesse Kirchner, Elena Klementieva, Elena Klyachko, Petr Kocharov, Arne Köhn, Abdullatif Köksal, Veronika Kolářová, Kamil Kopacewicz, Timo Korkiakangas, Mehmet Köse, Alexey Koshevoy, Nelda Kote, Natalia Kotsyba, Barbara Kovačić, Jolanta Kovalevskaitė, Emmanuelle Kowner, Simon Krek, Parameswari Krishnamurthy, Sandra Kübler, Lucie Kučová, Adrian Kuqi, Oğuzhan Kuyrukçu, Aslı Kuzgun, Sookyoung Kwak, Kris Kyle, Käbi Laan, Veronika Laippala, Lorenzo Lambertino, Israel Landau, Tatiana Lando, Septina Dian Larasati, Pierre Larrivée, Alexei Lavrentiev, John Lee, Phương Lê Hồng, Alessandro Lenci, Saran Lertpradit, Herman Leung, Maria Levina, Lauren Levine, Cheuk Ying Li, Josie Li, Keying Li, Yixuan Li, Yuan Li, KyungTae Lim, Bruna Lima Padovani, Yi-Ju Jessica Lin, Krister Lindén, Yang Janet Liu, Zoey Liu, Nikola Ljubešić, Irina Lobzhanidze, Olga Loginova, Markéta Lopatková, Lucelene Lopes, Edita Luftiu, Arsenii Lukashevskyi, Stefano Lusito, Anne-Marie Lutgen, Andry Luthfi, Mikko Luukko, Olga Lyashevskaya, Teresa Lynn, Vivien Macketanz, Menel Mahamdi, Jean Maillard, Ilya Makarchuk, Aibek Makazhanov, Francesco Mambrini, Michael Mandl, Christopher Manning, Ruli Manurung, Büşra Marşan, Cătălina Mărănduc, David Mareček, Katrin Marheinecke, Stella Markantonatou, Héctor Martínez Alonso, Lorena Martín Rodríguez, André Martins, Cláudia Martins, Jan Mašek, Hiroshi Matsuda, Yuji Matsumoto, Alessandro Mazzei, Ryan McDonald, Sarah McGuinness, Maitrey Mehta, Pierre André Ménard, Gustavo Mendonça, Hilla Merhav, Tatiana Merzhevich, Paul Meurer, Niko Miekka, Marie Mikulová, Emilia Milano, Aleksandra Miletić, Aaron Miller, Junghyun Min, Yael Minerbi, Jiří Mírovský, Karina Mischenkova, Anna Missilä, Cătălin Mititelu, Maria Mitrofan, Yusuke Miyao, Biswakalpita Mohapatra, AmirHossein Mojiri Foroushani, Judit Molnár, Amirsaeid Moloodi, Simonetta Montemagni, Amir More, Laura Moreno Romero, Giovanni Moretti, Shinsuke Mori, Tomohiko Morioka, Shigeki Moro, Bjartur Mortensen, Bohdan Moskalevskyi, Kadri Muischnek, Robert Munro, Yugo Murawaki, Nikolett Mus, Kaili Müürisep, Pinkey Nainwani, Mariam Nakhlé, Juan Ignacio Navarro Horñiacek, Anna Nedoluzhko, Gunta Nešpore-Bērzkalne, Manuela Nevaci, Lương Nguyễn Thị, Huyền Nguyễn Thị Minh, Yoshihiro Nikaido, Vitaly Nikolaev, Rattima Nitisaroj, Victor Norrman, Alireza Nourian, Michal Novák, Maria das Graças Volpe Nunes, Hanna Nurmi, Stina Ojala, Atul Kr. Ojha, Hulda Óladóttir, Adédayọ̀ Olúòkun, Mai Omura, Emeka Onwuegbuzia, Noam Ordan, Petya Osenova, Robert Östling, Annika Ott, Lilja Øvrelid, Masanori Oya, Şaziye Betül Özateş, Merve Özçelik, Arzucan Özgür, Balkız Öztürk Başaran, Teresa Paccosi, Petr Pajas, Alessio Palmero Aprosio, Jarmila Panevová, Anastasia Panova, Thiago Alexandre Salgueiro Pardo, Shantipriya Parida, Hyunji Hayley Park, Niko Partanen, Elena Pascual, Marco Passarotti, Agnieszka Patejuk, Guilherme Paulino-Passos, Giulia Pedonese, Oggi Peeters, Angelika Peljak-Łapińska, Siyao Peng, Siyao Logan Peng, Rita Pereira, Sílvia Pereira, Cenel-Augusto Perez, Natalia Perkova, Guy Perrier, Slav Petrov, Daria Petrova, Andrea Peverelli, Jason Phelan, Claudel Pierre-Louis, Jussi Piitulainen, Yuval Pinter, Clara Pinto, Rodrigo Pintucci, Tommi A Pirinen, Emily Pitler, Magdalena Plamada, Barbara Plank, Alistair Plum, Thierry Poibeau, Larisa Ponomareva, Martin Popel, Clamença Poujade, Lauma Pretkalniņa, Rigardt Pretorius, Sophie Prévost, Prokopis Prokopidis, Adam Przepiórkowski, Robert Pugh, Tiina Puolakainen, Christoph Purschke, Sampo Pyysalo, Peng Qi, Andreia Querido, Andriela Rääbis, Ella Rabinovich, Alexandre Rademaker, Mutee-u Rahman, Mizanur Rahoman, Taraka Rama, Loganathan Ramasamy, Carlos Ramisch, Joana Ramos, Fam Rashel, Mohammad Sadegh Rasooli, Vinit Ravishankar, Livy Real, Petru Rebeja, Siva Reddy, Mathilde Regnault, Georg Rehm, Arij Riabi, Ivan Riabov, Michael Rießler, Erika Rimkutė, Larissa Rinaldi, Laura Rituma, Putri Rizqiyah, Luisa Rocha, Eiríkur Rögnvaldsson, Ivan Roksandic, Norton Trevisan Roman, Mykhailo Romanenko, Natalia Romanova, Rudolf Rosa, Valentin Roșca, Paulette Roulon, Davide Rovati, Ben Rozonoyer, Olga Rudina, Jack Rueter, Paolo Ruffolo, Kristján Rúnarsson, Rozana Rushiti, Shoval Sadde, Pegah Safari, Aleksi Sahala, Kalyanamalini Sahoo, Saraswati Sahoo, Shadi Saleh, Alessio Salomoni, Tanja Samardžić, Konstantinos Sampanis, Stephanie Samson, Xulia Sánchez-Rodríguez, Manuela Sanguinetti, Ezgi Sanıyar, Dage Särg, Marta Sartor, Albina Sarymsakova, Mitsuya Sasaki, Baiba Saulīte, Agata Savary, Yanin Sawanakunanon, Shefali Saxena, Kevin Scannell, Salvatore Scarlata, Emmanuel Schang, Nathan Schneider, Sebastian Schuster, Lane Schwartz, Djamé Seddah, Wolfgang Seeker, Sven Sellmer, Mojgan Seraji, Magda Ševčíková, Petr Sgall, Syeda Shahzadi, Mo Shen, Atsuko Shimada, Gyu-Ho Shin, Hiroyuki Shirasu, Yana Shishkina, Muh Shohibussirri, Maria Shvedova, Jean Sibille, Janine Siewert, Einar Freyr Sigurðsson, João Silva, Aline Silveira, Natalia Silveira, Sara Silveira, Maria Simi, Radu Simionescu, Katalin Simkó, Mária Šimková, Haukur Barri Símonarson, Kiril Simov, Dmitri Sitchinava, Ted Sither, Aaron Smith, Isabela Soares-Bastos, Per Erik Solberg, Dolores Sollberger, Barbara Sonnenhauser, Shafi Sourov, Nina Speransky, Rachele Sprugnoli, Vivian Stamou, Steinþór Steingrímsson, Antonio Stella, Jan Štěpánek, Barbora Štěpánková, Abishek Stephen, Milan Straka, Omer Strass, Emmett Strickland, Jana Strnadová, Alane Suhr, Yogi Lesmana Sulestio, Umut Sulubacak, Hakyung Sung, Shingo Suzuki, Daniel Swanson, Zsolt Szántó, Chihiro Taguchi, Dima Taji, Luigi Talamo, Fabio Tamburini, Mary Ann C. Tan, Takaaki Tanaka, Dipta Tanaya, Mirko Tavoni, Nursena Teker, Samson Tella, Isabelle Tellier, Marinella Testori, Guillaume Thomas, Tarık Emre Tıraş, Thea Tollersrud, Sara Tonelli, Liisi Torga, Lucas Toribio, Marsida Toska, Trond Trosterud, Anna Trukhina, Reut Tsarfaty, Kira Tulchynska, Utku Türk, Francis Tyers, Sveinbjörn Þórðarson, Vilhjálmur Þorsteinsson, Sumire Uematsu, Roman Untilov, Zdeňka Urešová, Larraitz Uria, Hans Uszkoreit, Andrius Utka, Elena Vagnoni, Sowmya Vajjala, Socrates Vak, Socrates Vakirtzian, Rob van der Goot, Martine Vanhove, Daniel van Niekerk, Gertjan van Noord, Viktor Varga, Uliana Vedenina, Giulia Venturi, Marianne Vergez-Couret, Barbora Vidová Hladká, Eric Villemonte de la Clergerie, Veronika Vincze, Anishka Vissamsetty, Natalia Vlasova, Eleni Vligouridou, Aya Wakasa, Joel C. Wallenberg, Lars Wallin, Abigail Walsh, John Wang, Jonathan North Washington, Leonie Weissweiler, Maximilan Wendt, Paul Widmer, Shira Wigderson, Sri Hartati Wijono, Vanessa Berwanger Wille, Seyi Williams, Miriam Winkler, Shuly Wintner, Mats Wirén, Christian Wittern, Alena Witzlack-Makarevich, Tsegay Woldemariam, Tak-sum Wong, Alina Wróblewska, Qishen Wu, Mary Yako, Kayo Yamashita, Naoki Yamazaki, Chunxiao Yan, Xiulin Yang, Koichi Yasuoka, Marat M. Yavrumyan, Arife Betül Yenice, Enes Yılandiloğlu, Olcay Taner Yıldız, Zhuoran Yu, Arlisa Yuliawati, Zdeněk Žabokrtský, Shorouq Zahra, Amir Zeldes, He Zhou, Hanzhi Zhu, Yilun Zhu, Anna Zhuravleva, Rayan Ziane, Artūrs Znotiņš References Marie-Catherine de Marneffe, Christopher Manning, Joakim Nivre, Daniel Zeman. 2021. Universal Dependencies. In Computational Linguistics 47:2, pp. 255–308. Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman. 2020. Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. In Proceedings of LREC. -------------------------------------------------------------------------------- Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC. Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In COLING Workshop on Cross-framework and Cross-domain Parser Evaluation. Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher Manning. 2014. Universal Stanford Dependencies: A cross-linguistic typology. In Proceedings of LREC. Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of LREC. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of LREC. Daniel Zeman. 2008. Reusable Tagset Conversion Using Tagset Drivers. In Proceedings of LREC.

1 0

May 2025 Newsletter - LDC
by Penn LDC 15 May '25

15 May '25

In this newsletter: New publications: BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Audio<https://catalog.ldc.upenn.edu/LDC2025S04> BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Transcripts and Translations<https://catalog.ldc.upenn.edu/LDC2025T05> ________________________________ New publications: BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio<https://catalog.ldc.upenn.edu/LDC2025S04> was developed by LDC and consists of 93 hours of speech from 236 unscripted telephone conversations between native speakers of the Mandarin Chinese dialect spoken in mainland China. The calls were collected by LDC in the CALLFRIEND and CALLHOME series where participants called family members or close friends and spoke on topics of their choice. Around 60% of the recordings (141 calls) are publicly released for the first time. The remaining 95 recordings were previously published by LDC in various CALLFRIEND, CALLHOME, and HUB5 Mandarin datasets. The data is divided into training, development, and evaluation partitions. The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, conversational telephone speech, text messaging, and chat -- in Chinese, Egyptian Arabic, and English. The material in this release represents the unannotated Chinese source conversational telephone speech. The telephone data was transcribed, translated, and annotated for various tasks in the BOLT program including word alignment, treebanking, and co-reference. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Transcripts and Translations<https://catalog.ldc.upenn.edu/LDC2025T05> contains transcripts and corresponding English translations for the conversational telephone speech in BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Audio<https://catalog.ldc.upenn.edu/LDC2025S04> and was developed by LDC to support the DARPA BOLT program. Transcribers were required to produce a verbatim transcript of all speech within a file using simplified Chinese orthography and to add minimal markup to capture salient features of the speech. Some transcripts include redactions for potential personally identifying information. All speech data was transcribed and is divided into training, development, and evaluation partitions. The goal of the BOLT translation task was to translate the Chinese transcripts into fluent English while preserving the meaning present in the original Chinese text. Transcripts in the development and evaluation partitions received first pass and gold standard translations. 89% of the transcripts were translated into English. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

Open Positions: Phd Candidate and PostDoc in Project on Multilingual Tokenization
by Beinborn, Lisa 15 May '25

15 May '25

We have open positions in a new 3-year project on multilingual tokenization. It is a collaboration between Miryam de Lhoneux (Leuven, Belgium) and Lisa Beinborn (Göttingen, Germany) funded jointly by FWO and DFG. Application deadline is June 6th, positions remain open until filled. For more information see: PhD candidate position in Göttingen: https://www.uni-goettingen.de/de/644546.html?&details=3091 PostDoc position in Belgium: https://www.kuleuven.be/personeel/jobsite/jobs/60477935 We are looking forward to your applications. Best regards, Lisa Beinborn ------------------------------------------------- Prof. Dr. Lisa Beinborn Human-Centered Data Science https://www.uni-goettingen.de/huds Institute for Computer Science University of Göttingen -------------------------------------------------

1 0

Curricular internship on Evaluating LLMs @FBK - Trento - Italy
by Leonardo Sanna 15 May '25

15 May '25

Looking for an internship or a cool thesis topic? The IDA Research Group - FBK <https://ida.fbk.eu/> is looking for a brilliant young student to join us for a *curricular internship* on evaluating Large Language Models! You can see more information on the dedicated page <https://jobs.fbk.eu/Annunci/Offerte_di_lavoro_Large_Language_Models_Evaluat…> . Remote/Hybrid work might be considered. We welcome both students from the humanities and the computer science area. Examples, but not limited to, include: - Linguistics - Communication Sciences - Philosophy of Language - Artificial Intelligence - Data Science - Natural Language Processing - Information Retrieval Proficiency in both Italian and English is required, with fluency in at least one language (C1) and a good working knowledge of the other (B1). Knowledge of other languages is a plus. Feel free to drop me an email if you have any questions. You can apply on the website <https://jobs.fbk.eu/Annunci/Offerte_di_lavoro_Large_Language_Models_Evaluat…> or send me your CV if you prefer. -- *Leonardo Sanna* Researcher | *Intelligent Digital Agents (IDA) Research Group* Via Sommarive 18, 38123, Povo, Trento, Italy Tel. / Whatsapp +39 3516760260 -- -- Le informazioni contenute nella presente comunicazione sono di natura privata e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai destinatari indicati e per le finalità strettamente legate al relativo contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente. -- The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you received this in error, please contact the sender and delete the material.

1 0

2025

2024

2023

2022