December 2023 - Corpora

[2nd CfP]: Computational Approaches to Language Data Pseudonymization @ EACL 2024
by Elena Volodina 14 Jan '24

14 Jan '24

Second Call for papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024 Website: https://mormor-karl.github.io/events/CALD-pseudo/ Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/ Submission Deadline: Monday, 18 December 2023 We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024. [Important Dates] * December 18, 2023: paper submission deadline * January 17, 2024: resubmission of already pre-reviewed ARR papers * January 20, 2024: notification of acceptance * January, 30 2024: camera-ready papers due * March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL) [Introduction] Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected. [Topics of Interest] CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as * automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person; * developing context-sensitive algorithms for replacement of personal information in unstructured data; * studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization; * effectiveness of pseudonymization as a way of protecting writer identity; * reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data; * constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those; * approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data; * pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance; * and numerous other open questions. [Submission Guidelines] Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories: * Full papers (up to 8 pages) for substantial contributions * Short papers (up to 4 pages) for ongoing or preliminary work All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/ Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.) We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop. [Invited speakers] We are happy to announce that the workshop will host two invited speakers: * Anders Søgaard, University of Copenhagen, Denmark * Ildikó Pilán, the Norwegian Computing Center, Norway [Workshop Organizers] * Elena Volodina, University of Gothenburg, Sweden * Therese Lindström Tiedemann, University of Helsinki, Finland * Simon Dobnik, University of Gothenburg, Sweden * Xuan-Son Vu, Umeå university, Sweden [Program Committee] A list of program committee members is available on the workshop website. [Contact] For inquiries, please contact mormor.karl(a)svenska.gu.se ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat… ___________________ Elena Volodina, PhD, Docent https://spraakbanken.gu.se/en/about/staff/elena Life is like a mirror. Smile at it and it smiles back at you. Peace Pilgrim

1 1

2nd Call for Participation: SemRel SemEval Shared Task 1
by Nedjma OUSIDHOUM 08 Jan '24

08 Jan '24

Dear corpora-list members, We are announcing the first SemEval shared task on Semantic Textual Relatedness (STR): A shared task on automatically detecting the degree of semantic relatedness (closeness in meaning) between pairs of sentences. The semantic relatedness of two language units has long been considered fundamental to understanding meaning (Halliday and Hasan, 1976; Miller and Charles, 1991), and automatically determining relatedness has many applications such as evaluating sentence representation methods, question answering, and summarization. Two sentences are considered semantically similar when they have a paraphrasal or entailment relation. On the other hand, relatedness is a much broader concept that accounts for all the commonalities between two sentences: whether they are on the same topic, express the same view, originate from the same time period, one elaborates on (or follows from) the other, etc. For instance, for the following sentence pairs: - Pair 1: a. There was a lemon tree next to the house. b. The boy enjoyed reading under the lemon tree. - Pair 2: a. There was a lemon tree next to the house. b. The boy was an excellent football player. Most people will agree that the sentences in pair 1 are more related than the sentences in pair 2. In this task, new textual datasets will be provided for Afrikaans <https://en.wikipedia.org/wiki/Afrikaans>, Algerian Arabic <https://en.wikipedia.org/wiki/Algerian_Arabic>, Amharic <https://en.wikipedia.org/wiki/Amharic>, English, Hausa <https://en.wikipedia.org/wiki/Hausa_language>, Hindi <https://en.wikipedia.org/wiki/Hindi>, Indonesian <https://en.wikipedia.org/wiki/Indonesian_language>, Kinyarwanda <https://en.wikipedia.org/wiki/Kinyarwanda>, Marathi <https://en.wikipedia.org/wiki/Marathi_language>, Moroccan Arabic <https://en.wikipedia.org/wiki/Moroccan_Arabic>, Modern Standard Arabic <https://en.wikipedia.org/wiki/Modern_Standard_Arabic>, Punjabi <https://en.wikipedia.org/wiki/Punjabi_language>, Spanish <https://en.wikipedia.org/wiki/Spanish_language>, and Telugu <https://en.wikipedia.org/wiki/Telugu_language>. Data Each instance in the training, development, and test sets is a sentence pair. The instance is labeled with a score representing the degree of semantic textual relatedness between the two sentences. The scores can range from 0 (maximally unrelated) to 1 (maximally related). These gold label scores have been determined through manual annotation. Specifically, a comparative annotation approach was used to avoid known limitations of traditional rating scale annotation methods This comparative annotation process (which avoids several biases of traditional rating scales) led to a high reliability of the final relatedness rankings. Further details about the task, the method of data annotation, how STR is different from semantic textual similarity, applications of semantic textual relatedness, etc. can be found in this paper: https://aclanthology.org/2023.eacl-main.55.pdf Tracks Each team can provide submissions for one, two or all of the tracks shown below: Track A: Supervised Participants are to submit systems that have been trained using the labeled training datasets provided. Participating teams are allowed to use any publicly available datasets (e.g., other relatedness and similarity datasets or datasets in any other languages). However, they must report additional data they used, and ideally report how impactful each resource was on the final results. Track B: Unsupervised Participants are to submit systems that have been developed without the use of any labeled datasets pertaining to semantic relatedness or semantic similarity between units of text more than two words long in any language. The use of unigram or bigram relatedness datasets (from any language) is permitted. Track C: Cross-lingual Participants are to submit systems that have been developed without the use of any labeled semantic similarity or semantic relatedness datasets in the target language and with the use of labeled dataset(s) from at least one other language. Note: Using labeled data from another track is mandatory for submission to this track. Deciding which track a submission should go to: - If a submission uses labeled data in the target language: submit to Track A - If a submission does not use labeled data in the target language but uses labeled data from another language: submit to Track C - If a submission does not use labeled data in any language: submit to Track B ** Here ‘labeled data’ refers to labeled datasets pertaining to semantic relatedness or semantic similarity between units of text more than two words long. Evaluation The official evaluation metric for this task is the Spearman rank correlation coefficient, which captures how well the system-predicted rankings of test instances align with human judgments. You can find the evaluation script for this shared task on our Github page <https://github.com/semantic-textual-relatedness/Semantic_Relatedness_SemEva…> . Helpful Links - Competition Website: https://codalab.lisn.upsaclay.fr/competitions/15704 - Task Website: <https://afrisenti-semeval.github.io/> https://semantic-textual-relatedness.github.io - Twitter X: <https://twitter.com/AfriSenti2023> https://twitter.com/SemRel2024 - Contact organisers semrel-semeval-organisers(a)googlegroups.com - Google group for participants semrel -semeval-participants(a)googlegroups.com Important Dates - Training data ready: 11 September 2023 - Evaluation Starts: 10 January 2024 - Evaluation End: 31 January 2024 - System Description Paper Due: February 2024 - SemEval workshop: Summer 2024 - (co-located with NAACL 2024) NB. We will organise a mentorship session in January and a system description writing tutorial in February for all participants, especially students and junior researchers. References - Shima Asaadi, Saif Mohammad, Svetlana Kiritchenko. 2019. Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. - M. A. K. Halliday and R. Hasan. 1976. Cohesion in English. London: Longman. - George A Miller and Walter G Charles. 1991. Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1):1–28 - Mohamed Abdalla, Krishnapriya Vishnubhotla, and Saif Mohammad. 2023. What Makes Sentences Semantically Related? A Textual Relatedness Dataset and Empirical Study. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 782–796, Dubrovnik, Croatia. Association for Computational Linguistics. Task Organizers Nedjma Ousidhoum Shamsuddeen Hassan Muhammad Mohamed Abdalla Krishnapriya Vishnubhotla Vladimir Araujo Meriem Beloucif Idris Abdulmumin Seid Muhie Yimam Nirmal Surange Christine De Kock Sanchit Ahuja Oumaima Hourrane Manish Shrivastava Alham Fikri Aji Thamar Solorio Saif M. Mohammad

1 1

Postdoc positions at the Alan Turing Institute (Deadline: 07/01/2024)
by Pranava Madhyastha 03 Jan '24

03 Jan '24

Dear all, We are hiring for the following two postdoctoral positions at the Alan Turing Institute both focussed on probabilistic program scaffolds for large language models. This is a collaborative project lead by Dr. Pranava Madhyastha from City, University of London along with Prof. Alessandra Russo from Imperial College London and Prof. Anthony Cohn from the University of Leeds. Opportunity 1: LLM Inference Expert The first position requires experience with controlling inference in LLMs and transformer-based sequence-to-sequence models. More details and application link can be found here: https://cezanneondemand.intervieweb.it/turing/jobs/senior-research-associat… . Opportunity 2: Probabilistic Programming Specialist The second position requires a solid background in probabilistic programming, logic programming or symbolic models for artificial intelligence (more details and application link can be found here: https://cezanneondemand.intervieweb.it/turing/jobs/research-associate-proba… ) As a postdoctoral researcher at the Alan Turing Institute, you will be part of a vibrant and collaborative research environment, surrounded by renowned experts and cutting-edge technologies. This position provides an excellent platform to advance your career and make lasting contributions to the field of artificial intelligence. For any questions, get in touch with me (over pranava.madhyastha(a)city.ac.uk ). Kind regards, Pranava

1 1

NTCIR-18 First Call for Task Proposal
by CHUNG-CHI CHEN 31 Dec '23

31 Dec '23

Dear colleagues, (Apologize if you received multiple emails from different mailing lists) We are delighted to announce the call for task proposal of NTCIR-18. NTCIR (NII Testbeds and Community for Information Access Research) is a series of evaluation conferences that mainly focus on information access with East Asian languages and English. The first NTCIR conference (NTCIR-1) took place in August/September 1999, and the latest NTCIR-17 conference was held in December 2023. Research teams from all over the world participate in one or more NTCIR tasks to advance the state of the art and to learn from one another's experiences. We invite new task proposals within the expansive field of information access. Organizing an evaluation task entails pinpointing significant research challenges, strategically addressing them through collaboration with fellow researchers (including co-organizers and participants), developing the requisite evaluation framework to propel advancements in the state of the art, and generating a meaningful impact on both the research community and future developments. Prospective applicants are urged to underscore the real-world applicability of their proposed tasks by utilizing authentic data, focusing on practical tasks, and solving tangible problems. Additionally, they should confront challenges in evaluating information access technology, such as the extensive number of assessments needed for evaluation, ensuring privacy while using proprietary data, and conducting live tests with actual users. *Task Proposal Submission Due: Feb 9, 2024 (Anywhere on Earth)SUBMISSION LINK: https://easychair.org/conferences/?conf=ntcir18proposal <https://easychair.org/conferences/?conf=ntcir18proposal>* Below are more details, and please feel free to contact us if you have any questions. Happy holidays, and happy new year. Warm regards, NTCIR-18 Program Committee Co-Chairs Qingyao Ai, Chung-Chi Chen, and Shoko Wakamiya

1 0

Call for Participation: Survey on Language Documentation and NLP
by Luke Gessler 29 Dec '23

29 Dec '23

Dear colleagues, Have you ever worked at the intersection of natural language processing and endangered language documentation, or are you curious about doing so? My colleagues and I at the University of Colorado Boulder are surveying NLP researchers and documentary linguists who have done or are interested in this kind of work. Our goal is to better understand how to make NLP systems more practically successful in language documentation settings. If you have 15 minutes, we would be honored if you shared your experiences with us to help advance our understanding of NLP in language documentation. We invite you to participate by taking one of our two different surveys based on which group you belong to: - NLP researchers <https://docs.google.com/forms/d/e/1FAIpQLSeCFdMrbWmRqz7OAYbhoJYKX5g2NHPooXo…> - Documentary linguists <https://forms.gle/4pGhsbGQ36b58byn6> We look forward to reading what you have to share! Best regards, Luke Gessler

1 0

Join OSACT 2024 workshop for Pioneering Research and Shared Tasks
by m.zakiali80＠gmail.com 28 Dec '23

28 Dec '23

Dear Corpora Members, 🌟 Exciting Announcement: OSACT 2024 Workshop 🌟 Calling All Researchers in Computational Linguistics, NLP, and IR Specializing in Arabic Language! Are you at the forefront of research in low-resource languages, particularly Arabic? Do you delve into the complexities of computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) with a focus on Arabic? We invite you to explore and contribute to groundbreaking advancements in machine translation, particularly in developing models that seamlessly translate dialectal Arabic text into Modern Standard Arabic (MSA). Moreover, if your research aims to elevate the integrity and dependability of Arabic Large Language Models (LLMs) by innovating in hallucination detection and mitigation strategies, this workshop is a perfect platform for you. Join us at the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT6), a hub of innovation and scholarly exchange. Featured Shared Tasks: Tackling the Forefront Issues in Arabic LLMs and Modern Standard Arabic (MSA) Machine Translation Task 1: Arabic LLMs Hallucination Challenge: Address the critical issue of hallucinated content in Arabic language models. Engage in this vital conversation and present your solutions. Read details: https://osact-lrec.github.io/ Task 2: Dialect to MSA Machine Translation Challenge: Engage in the pivotal task of transforming dialectal Arabic into MSA through innovative translation models. We invite you to utilize your expertise in driving significant advancements in language processing, fostering more effective and meaningful exchanges in the Arabic-speaking world. Read details: https://osact-lrec.github.io/ Event Details: Date: May 25, 2024 Location: Torino, Italy In conjunction with the esteemed LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. Don’t miss this opportunity to contribute to a pioneering field! Key Dates: Paper Submission Deadline: February 25, 2024 Acceptance Notification: March 25, 2024 Visit our website, OSACT 2024: https://osact-lrec.github.io/ , to read more details and submission guidelines. For all your questions please send to OSACT.WORKSHOP(a)gmail.com Looking forward to your participation and to seeing you in LERC-COLING in May 2024! The OSACT 2024 Workshop Organizing Committee -- Mona Ali MSc. PhD. Computer Science Associate Professor Northeastern University Vancouver Campus 410 W Georgia, 14th Floor Vancouver, BC | Land of: xʷməθkʷəy̓əm, Sḵwx̱wú7mesh, and səlilwətaɬ

1 0

CNLP4DH First Call for Papers
by Nicolas Nicolas 28 Dec '23

28 Dec '23

************************************************************************* CNLP4DH First Call for Papers: Throughout 2024 Journal of Data Mining and Digital Humanities (JDMDH) organizes a worldwide call for papers about the topic Chinese Natural Language Processing for Digital Humanities (CNLP4DH) As a reminder JDMDH is an international-based journal managed by French national research institutions and green open access (no charge for readers and authors). This special issue is dedicated to natural language processing for digital humanities involving the documents written in Chinese, including Modern, Ancient and dialectal Chinese. Mandarin, which is the national official and main common language, can be accepted and research on texts written in other languages, such as Tibet, Inner Mongolia, etc., is also welcome. A list of suitable topics includes but are not limited to: - Text analysis and processing related to humanities using computational methods - Dataset creation and curation for NLP (e.g. digitization, datafication, and data preservation). - Research on cultural heritage collections such as national archives and libraries using NLP - NLP for error detection, correction, normalization and denoising data - Generation and analysis of literary works such as poetry and novels - Analysis and detection of text genres - Word segmentation, part-of-speech tagging of Ancient Chinese - Large Language Models (LLM) for Chinese in Digital Humanities - Cross modal Models (text-speech-video-image) for Chinese in Digital Humanities - Visualization of text analytics - Ontology models for natural language text - Applications in Chinese Literature, Traditional Chinese medicine, Learning Chinese language as second language, Sentiment Analysis in Chinese Social Media, China Cultural Heritage, Chinese History, Ancient Chinese language submission guideline: https://jdmdh.episciences.org/page/submissions Paper submission : https://jdmdh.episciences.org/submit Website and more details: https://jdmdh.episciences.org/page/chinese-natural-language-processing-for-… Guest Editors: Dr. Wenhe FENG (Guangdong University of Foreign Studies, Laboratory of Language Engineering and Computing) Dr. Bin LI (Nanjing Normal University, School of Chinese Language and Literature, Center of Linguistic Big Data and Computational Humanities) Dr. Nicolas TURENNE (Guangdong University of Foreign Studies, School of Information Science and Technology) Dr. Tong WEI (Beijing University, Digital Humanities Center) *************************************************************************

1 0

First Call for Papers: RaPID-5@LREC-COLING 2024
by Dimitrios Kokkinakis 27 Dec '23

27 Dec '23

*********************************************************************************** First Call for Papers: The 5th workshop on: "Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments" Workshop: co-located with LREC-COLING 2024 | Turin, Italy | May 21st, 2024 RaPID-5 serves as an interdisciplinary platform for researchers to exchange insights, methods, and experiences related to collecting and processing data from individuals with mental, cognitive, neuropsychiatric, or neurodegenerative impairments. The workshop focuses on creating, processing, and applying such data resources from individuals at different stages and severity levels of these impairments. The ultimate goal of RaPID-5 is to facilitate the study of relationships among linguistic, paralinguistic, and extra-linguistic observations, with applications ranging from aiding diagnosis to enhancing monitoring and predicting individuals at higher risk, ultimately promoting multidisciplinary collaboration across clinical, language technology, computational linguistics, and computer science communities. Submission deadline: Sun., 31st of March, 2024 (anywhere on earth) Paper submission: https://softconf.com/lrec-coling2024/rapid2024/ Website and more details: https://spraakbanken.gu.se/en/rapid-2024 Contact: Dimitrios Kokkinakis Contact email: dimitrios.kokkinakis(a)gu.se Organizing committee: * Kathleen C. Fraser, National Research Council, Canada; * Dimitrios Kokkinakis, University of Gothenburg, Sweden; * Kristina Lundholm Fors, Lund University, Sweden; * Charalambos K. Themistocleous, University of Oslo, Norway * Athanasios Tsanas, The University of Edinburgh, UK * Fredrik Öhman, University of Gothenburg and Sahlgrenska University Hospital, Sweden ************************************************************************************

1 0

Assistant professorship with tenure track in NLP at TU Vienna
by Pia Pachinger 27 Dec '23

27 Dec '23

1 0

CfP: Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability at LREC-COLING 2024
by fgaspari＠sslmit.unibo.it 24 Dec '23

24 Dec '23

[Apologies for multiple postings] CALL FOR PAPERS Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability co-located with LREC-COLING 2024, May 2024, Turin (Italy) See further details at https://european-language-equality.eu/tdle-2024/ [1] 1 Description and Aims of the Workshop The key aim of this half-day workshop co-located with LREC-COLING 2024 (https://lrec-coling-2024.org/), to be held in Turin (Italy) in May 2024, is to discuss and promote the importance of sustainability in the design, development, creation, use, distribution and sharing of language data, resources, platforms, infrastructures, tools and technologies, with the intention of achieving Digital Language Equality (DLE). While some important work has recently addressed these crucial areas (e.g. Fort and Couillault, 2016; Hessenthaler et al., 2022; Ramesh et al., 2023; Castilho et al., forthcoming), the relevant contributions seem to be as yet unsystematic and relatively isolated. The workshop intends to provide an inclusive forum to encourage in-depth debate and facilitate collaborations to promote the sustainability of resources and technologies in any (combination of) languages, in support of multilingualism and of the overarching goal of DLE. The sustainability of language resources and technologies is key to enabling multilingualism and digital language equality in the age of Artificial Intelligence. 2 Topics of Interest The second international Towards Digital Language Equality (TDLE) workshop focuses on sustainability in relation to the design, development, creation, use, distribution and sharing of language data, resources, platforms, infrastructures, tools and technologies, with a view to promoting the broader goal of Digital Language Equality (DLE). The concept of DLE has been firmly established in relation to all languages of Europe (Rehm and Way, 2023), and has the potential to also benefit other languages throughout the world, to support the prosperity of the respective communities at a time of impressive - but as yet very unevenly distributed and severely imbalanced - progress in language-centric Artificial Intelligence (AI), e.g. through large language models (LLMs). The workshop places particular emphasis on multilingualism and on leveling up digital support for languages, domains and applications that have so far been underserved, and wishes to explore ways to develop policies and funding streams to work towards sustainability in connection with DLE, especially in support of regional, minority and territorial languages. To this end, recognizing that the sustainability of Language Resources and Technologies (LRTs) is key to enabling multilingualism and DLE in the age of AI, topics of particular interest for the workshop on which we invite original contributions covering any (combination of) languages include, but are not limited to, the following: - research on the factors affecting DLE and the sustainability of LRTs; - best practices, case studies and validated guidelines related to the design, implementation and improvement of sustainability of written, oral/spoken, signed and/or multimodal LRTs (including LLMs), particularly in support of DLE; - how multilingual LLM technology can support DLE; - retrospectively assessing the sustainability of legacy LRTs, and future-proofing new LRTs in the interest of DLE; - analyzing the costs and benefits of foregrounding sustainability for LRTs; - the role of metadata, accompanying documentation and licenses in showing and improving the sustainability of LRTs; - sustainability, fairness and accessibility (e.g. for users with physical or cognitive disabilities, limited computing resources and connectivity) of platforms and infrastructures hosting, distributing and sharing LRTs in the interest of DLE; - how current data and computing access inequality is affecting DLE (in particular regarding LLMs); - ecological sustainability and environmental fairness of developing and deploying state-of-the-art LRTs, e.g. LLMs with regard to energy consumption, global warming and climate change; - developing data and parameter efficient methods to train or adapt language models to new languages; - how to evaluate, measure, compare and improve the sustainability of LRTs; - establishing benchmarks and protocols to ensure the sustainability of LRTs; - how to avoid the potential dangers of developing and using unfair and unsustainable LRTs, e.g. for malicious, ill-intentioned or harmful purposes; - ethical, legal, cultural and/or socio-economic implications of (ignoring) fairness and sustainability of LRTs; - developing and implementing forward-looking policies to promote fairness and long-term sustainability of LRTs to achieve DLE; - education and training needs and experiences in relation to promoting fairness and sustainability of LRTs and ways to raise broad awareness of DLE and related topics, e.g. among the general public, policy- and decision-makers. Given this wide-ranging and inclusive remit, the workshop intends to bring together developers, creators, vendors, distributors, brokers, users, evaluators and researchers of written, oral/spoken, signed and/or multimodal LRTs in any (combination of) languages. 3 Background and First TDLE Workshop Held in 2022 The second 2024 edition of the workshop builds on the success of the first Towards Digital Language Equality (TDLE) workshop, that was held at LREC 2022 in Marseille (France) on 20 June 2022, and whose accepted papers were published in a dedicated volume of proceedings, Aldabe et al. (2022). Following this well-received inaugural workshop held in June 2022, the second event in the series will be co-located with LREC-COLING 2024 in Turin (Italy) in May 2024, and will focus specifically on the highly relevant topic of the sustainability of LRTs in connection with multilingualism and DLE. 4 Submissions Up-to-date information on the workshop, including materials for authors, guidelines, templates, stylesheet and key dates can be found at the dedicated website https://european-language-equality.eu/tdle-2024/ [1]. To contact the organizing committee of the workshop directly, you can email tdle2024.hitz(a)ehu.eus. Papers submitted to the workshop should be completely anonymous for double-blind peer review, written in English, and prepared using the official LREC-COLING 2024 author's kit and submission stylesheet/template available at https://lrec-coling-2024.org/authors-kit/ [2]. The submissions to the workshop should not exceed 8 pages, excluding references, and be saved in unprotected PDF format. Papers should be submitted no later than 23 February 2024 through the START submission management system available via the workshop website at https://european-language-equality.eu/tdle-2024/ [1]. The workshop seeks original papers, i.e. it does not accept submissions that have been, or will be, published elsewhere. The workshop allows simultaneous submissions, and in these cases the authors should clearly indicate in the manuscript to which other conference, workshop or venue they have submitted the paper for review. Each paper submitted to the workshop will receive three double-blind peer reviews. Papers accepted for presentation will be included in the proceedings of the workshop. In light of the LREC-COLING 2024 Map and the "Share your LRs!" initiative, when submitting their papers through the START system authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of their research. Moreover, ELRA encourages all LREC-COLING authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones). 5 Key Dates Paper submission deadline: 23 February 2024 Notification of acceptance: 19 March 2024 Camera-ready papers due: 8 April 2024 Half-day workshop date: 20, 21 or 25 May 2024 (TBC) 6 Workshop Organizers - Itziar Aldabe (HiTZ Basque Center for Language Technology - Ixa, University of the Basque Country, Spain) - Begoña Altuna (HiTZ Basque Center for Language Technology - Ixa, University of the Basque Country, Spain) - Aritz Farwell (HiTZ Basque Center for Language Technology - Ixa, University of the Basque Country, Spain) - Federico Gaspari (University of Naples "Federico II", Italy & ADAPT Centre, Dublin City University, Ireland - co-chair) - Joss Moorkens (School of Applied Language & Intercultural Studies/ADAPT Centre, Dublin City University, Ireland - co-chair) - Stelios Piperidis (Institute of Language and Speech Processing, Athena Research and Innovation Center in Information, Communication and Knowledge Technologies, Greece) - Georg Rehm (Speech and Language Technology Lab, Deutsches Forschungszentrum für Künstliche Intelligenz, Germany) - German Rigau (HiTZ Basque Center for Language Technology - Ixa, University of the Basque Country, Spain) 7 Program Committee - Antonios Anastasopoulos (GMU, USA) - Anya Belz (ADAPT, DCU, Ireland) - Steven Bird (CDU, Australia) - Fred Blain (Uni. Tilburg, Netherlands) - Franco Cutugno (Uni. Naples "Federico II", Italy) - Bessie Dendrinos (NKUA, Greece & ECSPM, Denmark) - Félix do Carmo (Uni. Surrey, UK) - Annika Grützner-Zahn (DFKI, Germany) - Ana Guerberof-Arenas (Uni. Groningen, Netherlands) - Davyth Hicks (ELEN, Belgium) - Monja Jannet (ADAPT, DCU, Ireland) - John Judge (ADAPT, DCU, Ireland) - Dorothy Kenny (SALIS/CTTS/ADAPT, DCU, Ireland) - Sabine Kirchmeier (EFNIL, Luxembourg) - Teresa Lynn (MBZUAI, United Arab Emirates) - Maite Melero (BSC, Spain) - Helena Moniz (Uni. Lisbon, Portugal & EAMT) - Johanna Monti (UniOR, Italy) - Rachele Raus (UniBO, Italy) - Wessel Reijers (Uni. Paderborn, Germany) - Celia Rico Pérez (Universidad Complutense de Madrid, Spain) - Dimitar Shterionov (TU, Netherlands) - Carlos S. C. Teixeira (IOTA Localisation Services & Uni. Rovira i Virgili, Spain) - Antonio Toral (Uni. Groningen, Netherlands) - Vincent Vandeghinste (Instituut voor de Nederlandse Taal, Netherlands & KU Leuven, Belgium) References Itziar Aldabe, Begoña Altuna, Aritz Farwell and German Rigau, editors. 2022. Proceedings of the Workshop Towards Digital Language Equality (TDLE). European Language Resources Association, Marseille, France. Sheila Castilho, Federico Gaspari, Joss Moorkens, Maja Popović and Antonio Toral, editors. Forthcoming. Journal of Specialised Translation. Special Issue n. 41 on "Translation Automation and Sustainability". Karën Fort and Alain Couillault, 2016. "Yes, We Care! Results of the Ethics and Natural Language Processing Surveys". Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). European Language Resources Association, Portorož, Slovenia. 1593-1600. Marius Hessenthaler, Emma Strubell, Dirk Hovy and Anne Lauscher, 2022. "Bridging Fairness and Environmental Sustainability in Natural Language Processing". Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates. 7817-7836. András Kornai, 2013. "Digital Language Death". PLoS ONE, 8(10):e77056. Krithika Ramesh, Sunayana Sitaram and Monojit Choudhury, 2023. "Fairness in Language Models Beyond English: Gaps and Challenges". Findings of the Association for Computational Linguistics: EACL 2023. Association for Computational Linguistics, Dubrovnik, Croatia. 2106-2119. Georg Rehm and Andy Way, editors. 2023. European Language Equality: A Strategic Agenda for Digital Language Equality. Berlin: Springer. Links: ------ [1] https://european-language-equality.eu/tdle-2024/ [2] https://lrec-coling-2024.org/authors-kit/

1 0

2026

2025

2024

2023

2022

Corpora December 2023