- Corpora - ELRA lists

1 Postdoc and 2 PhD positions in NLP and ML for healthcare at Amsterdam UMC, University of Amsterdam
by Iacer Calixto 28 Feb '24

28 Feb '24

*** Apologies for cross-posting *** Dear colleagues, The NLP4Health Lab <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnlp4healt…> in the Department of Medical Informatics at the Amsterdam UMC <https://www.amsterdamumc.org/en/research.htm> and University of Amsterdam <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.uva.n…> is hiring *one postdoctoral researcher *and *two PhD students* in *Responsible Natural Language Processing (NLP) and Machine Learning (ML) for Healthcare*, all positions are fully-funded. Do you have a strong background in NLP and ML, and a keen interest in large language models and healthcare? Please consider applying, we are accepting applications (until March 15*)! Please check details and apply via the links below: - PhD positions: https://werkenbij.amsterdamumc.org/en/vacatures/research/2-phd-positions-in… - Postdoc researcher position: https://werkenbij.amsterdamumc.org/en/vacatures/research/postdoctoral-resea… The positions are funded by an NGF AiNed Fellowship Grant for the "CaRe-NLP: Human-Centric and Responsible NLP methods for Dutch healthcare" project. The overall goal of the project is to develop human-centric and responsible NLP and ML methods for healthcare in the Netherlands, Europe, and worldwide. We will design, build, and evaluate state-of-the-art large language models (LLMs) for healthcare data that include (combinations of) free-text clinical notes collected in primary/secondary/intensive care settings, medical images, time series measurements, medical knowledge graphs, and multi-modal electronic heath records (EHRs). Our methods' goals are to ensure privacy and fairness, prevent bias, cope with data scarcity, and be interpretable and explainable. We collaborate with a network of clinicians across multiple specialties, and you will tackle relevant clinical problems with real-world impact. The project team is led by dr. Iacer Calixto and prof. Ameen Abu-Hanna, and we are housed at the Amsterdam UMC location University of Amsterdam in the beautiful city of Amsterdam. (Please feel free to share with your students/communities!) Have a great week, Iacer.

1 0

The ReproHum 2024 Survey of NLP and ML researchers' experience and views of reproducibility
by Thomson, Craig 28 Feb '24

28 Feb '24

Dear Corpora list members, As part of the EPSRC UK ReproHum project (https://reprohum.github.io), we are performing a survey of NLP and ML researchers’ experience and views of reproducibility. We would like to hear from as many researchers as possible (NLP or ML), not just those who work on evaluation! If you completed a similar survey in 2022 then you can still complete this one, we are interested in the difference in your experience and views between then and now. We would be most grateful if you are able to spend 5-10 minutes taking part in the survey, it can be accessed via the below link: https://forms.gle/RshrHcvAXxAEEFj59 With thanks and apologies for cross-posting. Craig Thomson Research Fellow Computing Science University of Aberdeen The University of Aberdeen is a charity registered in Scotland, No SC013683. Tha Oilthigh Obar Dheathain na charthannas clàraichte ann an Alba, Àir. SC013683.

1 0

PAN 2024: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality
by Matti Wiegmann 28 Feb '24

28 Feb '24

PAN 2024: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality Call for Participation We'd like to invite you to participate in the following shared tasks at PAN 2024 held in conjunction with the CLEF conference in Grenoble, France. 1. Voight-Kampff Generative AI Authorship Verification. Given two texts, one authored by a human, one by a machine: pick out the human. https://pan.webis.de/clef24/pan24-web/generated-content-analysis.html 2. Oppositional Thinking Analysis. Given an online message, is it a conspiracy theory or critical thinking? https://pan.webis.de/clef24/pan24-web/oppositional-thinking-analysis.html 3. Multi-Author Writing Style Analysis. Given a document, determine at which positions the author changes. https://pan.webis.de/clef24/pan24-web/style-change-detection.html 4. Multilingual Text Detoxification. Given a toxic piece of text, re-write it in a non-toxic way while saving the main content as much as possible. https://pan.webis.de/clef24/pan24-web/text-detoxification.html Find out more at https://pan.webis.de/clef24/pan24-web Important Dates -------------------------- now Training Data Released May 05, 2024 Software submission May 31, 2024 Participant paper submission June 24, 2024 Peer review notification July 08, 2024 Camera-ready participant papers submission Sep 09-12, 2024 Conference Links -------------------------- PAN: https://pan.webis.de Contact: pan(a)webis.de We are looking forward to your submission! The PAN team

1 0

CLEF-2024 CheckThat! Lab -- Call for Participation (training set available for all tasks)
by Piotr Przybyła 28 Feb '24

28 Feb '24

Dear colleague, We invite you to participate in the 2024 edition of the CheckThat! Lab at CLEF 2024. This year, we feature six tasks ---two follow-up and four new--- that correspond to important components within and around the full fact-checking pipeline in multiple languages: Task 1 Check-worthiness in tweets. to identify claims that could be important to verify on social- and mainstream media (the only task that has been organized during all editions of the lab; Available in Arabic, English, Dutch and Spanish. Task 2 Subjectivity in news articles. to spot text that should be processed with specific strategies; benefiting the fact-checking pipeline. Available in Arabic, English, German, Italian, and Multilingual. Task 3 Persuasion Techniques. to identify text spans in which a persuasion technique is being issued to influence the reader. This task is offered in four languages: Arabic, Bulgarian, English, Portuguese and Slovene. Task 4 Detecting hero, villain, and victim from memes. Detecting hero, villain, and victim from memes:} to predict the role of each entity: hero, villain, victim, or other in a given meme and a list of entities. Available in Arabic, English and Code-mixed. Task 5 Rumor Verification using Evidence from Authorities. to retrieve evidence from trusted sources (authorities that have “real knowledge'' on the matter) and determine if the rumor is supported, refuted, or unverifiable according to the evidence. Available in Arabic and English. Task 6 Robustness of Credibility Assessment with Adversarial Examples. to discover small changes that could be applied to misinformation text, causing the provided classifiers to make wrong predictions. Available for news articles, tweets, propaganda techniques and claims (including regarding COVID-19) in English. Further information: https://checkthat.gitlab.io/ Datasets: https://gitlab.com/checkthat_lab/clef2024-checkthat-lab <https://gitlab.com/checkthat_lab/clef2023-checkthat-lab> Register and participate: https://clef2024-labs-registration.dei.unipd.it/registrationForm.php <https://clef2023-labs-registration.dei.unipd.it/registrationForm.php> Important Dates --------------------- - November 2023: Lab registration opens - January 2024: Release of the training materials - 22 April 2024: Lab registration closes - 2 May 2024: Beginning of the evaluation cycle (test sets release) - 6 May 2024 (23:59 AOE): End of the evaluation cycle (run submission) - 31 May 2024: Deadline for the submission of working notes - 10 June 2024: Submission of Condensed Lab Overviews [LNCS] - 21 June 2024: Camera Ready Copy of Condensed Lab Overviews [LNCS] due - 24 June 2024: Notification of acceptance of working notes - 8 July 2024: Deadline for submission of camera-ready working notes - 22-26 July 2024: Preview of working notes - 9-12 September 2024: CLEF 2024 Conference in Grenoble, France Best regards, The CLEF-2024 CheckThat! Lab Shared Task Organizers

1 0

ASONAM 2024: Call for Workshop Proposals
by Rajesh Sharma 28 Feb '24

28 Feb '24

The 16th International Conference on Advances in Social Networks Analysis and Mining -ASONAM-2024 September 02-05, 2024, Calabria, Italy. Conference Link: https://asonam.cpsc.ucalgary.ca/2024/ Workshop information : https://asonam.cpsc.ucalgary.ca/2024/CFW.php#key_dates CALL FOR WORKSHOP PROPOSALS The 16th International Conference on Advances in Social Networks Analysis and Mining (ASONAM-2024) invites proposals for workshops at its annual conference. ASONAM 2024 will be held between September 02-05, 2024 in Calabria, Italy. ASONAM is an interdisciplinary venue that brings together practitioners and researchers from a variety of Social Network Analysis and Mining fields to promote collaborations and exchange of ideas and practices. The ASONAM 2024 Committee invites proposals for workshops to be held on September 02-05, 2024 in conjunction with the main ASONAM 2024 conference. Workshops can be either scheduled for a full day (morning and afternoon) or for half a day. Proposals should include following information: - The name of the workshop. - The names and addresses of the organizers, and a designated contact person. - Description of the workshop: abstract, objectives, relevance, and expected outcome. - The names of program committee members and, if applicable, other potential applicants. - A description of the plans for workshop (e.g., program, keynotes, highlights, etc.). - The expected number of attendees and the planned length of the workshop. - A description of past versions of the workshop, including dates, organizers, submission and acceptance counts, attendance, sites, and any other relevant information. Important dates: Workshop proposal deadline March 20, 2024 11:59 PM AoE Workshop acceptance notification April 10, 2024 11:59 PM AoE For paper submission in your proposal, reviewing and final revisions, please consider the following deadlines: Workshop paper submission deadline June 10, 2024 11:59 PM AoE Workshop paper acceptance notification July 10, 2024 11:59 PM AoE Workshop paper camera-ready deadline July 18, 2024 11:59 PM AoE Organizers of accepted proposals will be responsible for publicizing and running the workshop, including sending out calls for papers, reviewing submissions, producing the camera ready workshop proceedings, and organizing the meeting days. Submission Link: https://easychair.org/conferences/?conf=workshopsasonam2024 Looking forward to your workshop proposals which will help make ASONAM 2024 a success! Kind Regards Rajesh Sharma Associate Professor, Head, Computational Social Science Lab, Institute of Computer Science, University of Tartu, Estonia https://rajeshsharma.cs.ut.ee/

1 0

WojoodNER 2024 The 2nd Arabic Named Entity Recognition Shared Task
by Mustafa Jarrar 28 Feb '24

28 Feb '24

WojoodNER 2024 The 2nd Arabic Named Entity Recognition Shared Task at ArabicNLP’24 https://dlnlp.ai/st/wojood/ ندعوكم للمشاركة في المسابقة العلمية الثانية لاكتشاف الاعلام في النصوص العربية. سيحصل المشاركين على مدونة وجود الجديدة (٥٥٠ الف كلمة + انواع مفصلة من الاعلام). يوجد ثلاث مهام في المسابقة يمكن المشاركة باي منها، احدى المهام حول الحرب على غزة ويمكن للمشاركين استخدام بيانات خارجية فيها Dataset: Wojood-Fine <https://aclanthology.org/2023.arabicnlp-1.25/> New version: Arabic Fine-Grained Entity Recognition (Wojood + Subtypes of entity types). Subtask-1 (Closed-Track Flat Fine-Grain NER): We provide the Wojood-Fine Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). External data is not allowed .... (read more <https://dlnlp.ai/st/wojood/>). Subtask-2 (Closed-Track Nested Fine-Grain NER): This subtask is similar to the subtask-1, we provide the Wojood-Fine Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%) .... (read more <https://dlnlp.ai/st/wojood/>). Subtask-3 (Open-Track NER - Gaza War): to allow participants to reflect on the utility of NER in the context of real-world events, allow them to use external resources, and encourage them to use generative models in different ways (fine-tuned, zero-shot learning, in-context learning, etc.). The goal of focusing on generative models in this particular subtask is to help the Arabic NLP research community better understand the capabilities and performance gaps of LLMs in information extraction, an area currently understudied. We provide development and test data related to the current War on Gaza. This is motivated by the assumption that discourse about recent global events will involve mentions from different data distribution. For this subtask, we include data from five different news domains related to the War on Gaza - but we keep the names of the domains hidden. Participants will be given a development dataset (10K tokens, 2K from each of the five domains), and a testing dataset (50K tokens, 10K from each domain). Both development and testing sets are manually annotated with fine-grain named entities using the same annotation guidelines used in Subtask1 and Subtask2 (also described in Liqreina et al., 2023). .... (read more <https://dlnlp.ai/st/wojood/>). BASELINES Two baseline models trained on WojoodFine (flat and nested) are provided (See Liqreina et al., 2023 <https://aclanthology.org/2023.arabicnlp-1.25/>). The code used to produce these baselines is available on GitHub <https://github.com/SinaLab/ArabicNER>. Subtask Precision Recall Average Micro-F1 Flat Fine-Grain NER (Subtask 1) 0.8870 0.8966 0.8917 Nested Fine-Grain NER (Subtask 2) 0.9179 0.9279 0.9229 GOOGLE COLAB NOTEBOOKS To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models. [1] Train Flat Fine-Grain NER <https://gist.github.com/mohammedkhalilia/72c3261734d7715094089bdf4de74b4a>: This notebook can be used to train our ArabicNER model on the flat Fine-grain NER task using the sample Wojood_Fine data. [2] Evaluate Flat Fine-Grain NER <https://gist.github.com/mohammedkhalilia/c807eb1ccb15416b187c32a362001665>: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset. [3] Train Nested Fine-Grain NER <https://gist.github.com/mohammedkhalilia/a4d83d4e43682d1efcdf299d41beb3da>: This notebook can be used to train our ArabicNER model on the nested Fine-grain task using the sample Wojood data. [4] Evaluate Nested Fine-Grain NER <https://gist.github.com/mohammedkhalilia/9134510aa2684464f57de7934c97138b>: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset. REGISTRATION Participants need to register via this form (NERSharedTask 2024) <https://docs.google.com/forms/d/1ISMILgQYfUug3XuDpxFmuPASXkWaduYOUc3xOZuGwq…>. Participating teams will be provided with common training development datasets. No external manually labelled datasets are allowed. Blind test data set will be used to evaluate the output of the participating teams. Each team is allowed a maximum of 3 submissions. All teams are required to report on the development and test sets (after results are announced) in their write-ups. FAQ For any questions related to this task, please check our Frequently Asked Questions <https://docs.google.com/document/d/1W_13FRpP3NbDx_ALYJWA3-ESXPRVomOjNovUuYf…> IMPORTANT DATES - February 25, 2024: Shared task announcement. - March 1, 2024: Release of training data, development sets, scoring script, and Codalab links. - April 5, 2024: Registration deadline. - April 26, 2024: Test set made available. - May 3, 2024: Codalab Test system submission deadline. - May 10, 2024: Shared task system paper submissions due. - June 17, 2024: Notification of acceptance. - July 1, 2024: Camera-ready version. - August 16, 2024: ArabicNLP 2024 conference in Thailand. CONTACT For any questions related to this task, please contact the organizers directly using the following email address: NERSharedtask(a)gmail.com <mailto:NERSharedtask@gmail.com> . ORGANIZERS - Mustafa Jarrar, Birzeit University - Muhammad Abdul-Mageed, University of British Columbia & MBZUAI - Mohammed Khalilia, Birzeit University - Bashar Talafha, University of British Columbia - AbdelRahim Elmadany, University of British Columbia - Nagham Hamad, Birzeit University --Mustafa __________________________ Mustafa Jarrar, PhD Professor of Artificial Intelligence Chair, PhD Program in Computer Science Birzeit University, Palestine Whatsapp:+972599662258 | mjarrar(a)birzeit.edu <mailto:mjarrar@birzeit.edu> http://www.jarrar.info <http://www.jarrar.info/>

1 0

Extended Deadline [March 10, 2024] The First Workshop on Visualization for Natural Language Processing (Vis4NLP)
by Tariq Yousef 28 Feb '24

28 Feb '24

Apologies for cross-posting Extended Deadline: March 10, 2024 The First Workshop on Visualization for Natural Language Processing (Vis4NLP)May 27th 2024, Odense, Denmark The workshop will be co-located with EuroVis 2024 <https://www.eurovis.org/eurovis> in Odense, Denmark, and will take place in person on May 27. The workshop aims to create a dedicated space for interdisciplinary collaboration at the intersections of NLP and visualization. Vis4NLP serves as a pivotal platform where researchers, practitioners, and academics come together to collectively tackle the ever-evolving challenges and opportunities in NLP visualization. *Call for paper: http://vis4nlp.com/ <http://vis4nlp.com/>Workshop date: May 27, 2024Venue: Syddansk Universitet - University of Southern Denmark <http://sdu.dk/>* Important Dates - Workshop paper due: March 3, 2024 March 10, 2024 - Notification of acceptance: April 10, 2024 - Camera-ready papers due: April 20, 2024 - Workshop date: May 27, 2024 All submission deadlines are at 23:59 GMT on the date indicated. Best regards *Tariq Yousef*Assistant Professor of Data Science Department of Mathematics and Computer Science Faculty of Science *University of Southern Denmark* [image: image.png]

1 0

Final CfP: Games and NLP 2024 Workshop @ LREC-COLING
by Udo Kruschwitz 28 Feb '24

28 Feb '24

GAMES AND NLP 2024 @ LREC-COLING 2024 ===================================== Co-located with LREC-COLING in Turin, Italy 21st May 2024 https://gamesandnlp.com *** Deadline extended: Mar 4th *** Call for Papers -------------------- The 10th Workshop on Games and Natural Language Processing (Games and NLP 2024)—to be held at LREC-COLING 2024 — will examine the use of games and gamification for Natural Language Processing (NLP) tasks, as well as how NLP research can advance player engagement and communication within games. The Games and NLP workshop aims to promote and explore the possibilities for research and practical applications of games and gamification that have a core NLP aspect, either to generate resources and perform language tasks or as a game mechanic itself. This workshop investigates computational and theoretical aspects of natural language research that would be beneficial for designing and building novel game experiences, or for processing texts to conduct formal game studies. NLP would benefit from games in obtaining language resources (e.g., construction of a thesaurus or a parser through a crowdsourcing game), or in learning the linguistic characteristics of game users as compared to those of other domains. Topics (include, but are not limited to) -------------------------------------------------- • Games for collecting data useful for NLP • Gamification of NLP tasks • Player motivation and experience • Game design • Novel uses of natural language processing or generation as a game mechanic • Natural language in games as an alternative method of input for people with disabilities • Processing NLP game data • Analysis of large-scale game-related corpora • Real-time sentiment analysis of player discourse or chat • Evaluation of games for NLP • Serious games for learning languages • Player immersion in language-enabled mixed reality or physically embodied games • Narrative plot or text generation of text-based interactive narrative systems • Natural language understanding and generation of character dialogue • Ethical and privacy concerns of ownership of text and audio chat in massively multiplayer online games Submissions: ------------------ The papers should be submitted as a PDF document, conforming to the formatting guidelines provided in the call for papers of LREC-COLING conference (https://lrec-coling-2024.org/authors-kit/). Submissions are to be made via Softconf/START Conference Manager at https://softconf.com/lrec-coling2024/gamesandnlp2024/ Important Dates --------------------- • Submission Deadline: Mar 4th (*** extended ***) • Notification of Acceptance: Mar 26th • Camera Ready Deadline: Apr 1st • Workshop: May 21st Organisation Committee -------------------------------- • Chris Madge, chair (Queen Mary University of London) • Jon Chamberlain (University of Essex, UK) • Karën Fort (Sorbonne Université, France) • Udo Kruschwitz (University of Regensburg, Germany) • Stephanie Lukin (U.S. Army Research Laboratory) Programme Committee ------------------------------- • Alice Millour (Sorbonne Université) • Brent Harrison (University of Kentucky, US) • Ian Horswill (Northwestern University) • Jonathan Lessard (Universite Condoria) • Luisa Coheur (INESC-ID & Instituto Superior Técnico, University of Lisbon) • Mariët Theune (University of Twente) • Massimo Poesio (Queen Mary University, UK) • Mathieu Lafourcade (LIRMM, France) • Morteza Behrooz (University of California, Santa Cruz, US) • Pedro Santos (INESC-ID & Instituto Superior Técnico, University of Lisbon) • Richard Bartle (University of Essex, UK) • Seth Cooper (Northeastern University, US) • Valerio Basile (University of Turin, Italy) • Fatima Althani (Queen Mary University, UK)

1 0

Final CFP and deadline extension: 6th Workshop on Workshop on Open-Source Arabic Corpora and Processing Tools (Hybrid) with shared tasks @LREC-COLING 2024 in Turin (Italy)
by m.zakiali80＠gmail.com 28 Feb '24

28 Feb '24

**The 6th Workshop on Open-Source Arabic Corpora and Processing Tools (Hybrid) with shared tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation** The workshop will be conducted in a *hybrid* format to ensure maximum participation, accommodating attendees both online and in-person. Submission deadline: extended to * March 1 *, 2024 *Workshop site* : https://osact-lrec.github.io/ *shared tasks:* Task 1: Arabic LLMs Hallucination (contact Hamdy Mubarak), Link: https://sites.google.com/view/arabic-llms-hallucination Task 2: Dialect to MSA Machine Translation (contact Kareem Darwish), Link: https://codalab.lisn.upsaclay.fr/competitions/17118 *Co-located with LREC-COLING 2024* https://lrec-coling-2024.org/ Turin, Italy, 20-25 May 2024 * Important Dates* Submission deadline: extended to * March 1 *, 2024 Notification of acceptance: March 25, 2024 Camera-ready papers due: March 30, 2024 Workshop date: May 25, 2024 *Workshop Description* In the computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) communities, Arabic is considered to be relatively resource-poor compared to English. This situation was thought to be the reason for the limited number of language resources -based studies in Arabic. However, the past few years witnessed the emergence of new considerably large and free classical and Modern Standard Arabic (MSA) as well as dialectical corpora and to a lesser extent Arabic processing tools. This workshop follows the footsteps of previous editions of OSACT to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic Language Resources. The sixth workshop comes to encourage researchers and practitioners of Arabic language technologies, including CL, NLP and IR to share and discuss their latest research efforts, corpora, and tools. The workshop will also give special attention to Large Language Models (LLMs) and Generative AI, which is a hot topic nowadays. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Arabic LLMs Hallucination and Dialect to MSA Machine Translation. *Submissions Topics* Language Resources: - Pre-trained Arabic language models and their applications. - Surveying and evaluating the design of available Arabic corpora, their associated and processing tools. - Availing new annotated corpora for NLP and IR applications such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning. - Evaluating the use of crowdsourcing platforms for Arabic data annotation. - Open source Arabic processing toolkits. Tools and Technologies: Language education, e.g., L1 and L2. - Language modeling and pre-trained models. - Tokenization, normalization, word segmentation, morphological analysis, part-of-speech tagging, etc. - Sentiment analysis, dialect identification, and text classification. - Dialect translation. - Fake news detection. - Web and social media search and analytics. - Issues in the design, construction, and use of Arabic LRs: text, speech, sign, gesture, image, in single or multimodal/multimedia data. - Guidelines, standards, best practices, and models for LRs interoperability. - Methodologies and tools for LRs construction and annotation. - Methodologies and tools for extraction and acquisition of knowledge. - Ontologies, terminology, and knowledge representation. - LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.). Issues in the design, construction and use of Arabic LRs: - Guidelines, standards, best practices and models for LRs interoperability. - Methodologies and tools for LRs construction and annotation. - Methodologies and tools for extraction and acquisition of knowledge. - Ontologies, terminology and knowledge representation. - LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.). *Submissions* - Submission Instructions: https://lrec-coling-2024.org/authors-kit/ - Submission Link: https://softconf.com/lrec-coling2024/osact2024/ *Workshop organizers* - Hend Al-Khalifa ( King Saud University, KSA) - Hamdy Mubarak (Qatar Computing Research Institute, Qatar) - Kareem Darwish (aiXplain Inc., US) - Tamer Elsayed (Qatar University, Qatar) - Mona Ali (Northeastern University, Canada)

1 0

First CFP: LoResMT 2024 at ACL 2024
by Atul K. Ojha 27 Feb '24

27 Feb '24

Apologies for cross-posting. --------------------------------------------------------------------------- The Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024) https://www.loresmt.org/ @ ACL 2024 (August 11–16, 2024) Bangkok, Thailand SUBMISSION https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/LoResMT TIMELINE Paper submission due: May 17 (Friday), 2024, at 23:59 (Anywhere on Earth) Notification of acceptance: June 17 (Monday), 2024 Camera-ready papers due: July 1 (Monday), 2024, at 23:59 (Anywhere on Earth) Workshop dates at ACL: August 15, 2024 SCOPE Based on the success of past low-resource machine translation (MT) workshops at AMTA 2018 (https://amtaweb.org/), MT Summit 2019 ( https://www.mtsummit2019.com), AACL-IJCNLP 2020 (http://aacl2020.org/), AMTA 2021, COLING 2022 and EACL 2023, we introduce the Seventh LoResMT Workshop at ACL 2024. The workshop provides a discussion panel for researchers working on MT systems/methods for low-resource and under-represented languages in general. We would like to help review/overview the state of MT for low-resource languages and define the most important directions. We also solicit papers dedicated to supplementary NLP tools that are used in any language and especially in low-resource languages. Overview papers on these NLP tools are very welcome. It will be beneficial if the evaluations of these tools in research papers include their impact on the quality of MT output. TOPICS We are highly interested in (1) original research papers, (2) review/opinion papers, and (3) online systems on the topics below; however, we welcome all novel ideas that cover research on low-resource languages. - Neural machine translation (NMT) for low-resource languages - Use of LLMs (large language models) for low-resource MT systems - COVID-related corpora, their translations and corresponding NLP/MT systems - Work that presents online systems for practical use by native speakers - Word tokenizers/de-tokenizers for specific languages - Word/morpheme segmenters for specific languages - Alignment/Re-ordering tools for specific language pairs - Use of morphology analyzers and/or morpheme segmenters in MT - Multilingual/cross-lingual NLP tools for MT - Corpora creation and curation technologies for low-resource languages - Review of available parallel corpora for low-resource languages - Research and review papers on MT methods for low-resource languages - MT systems/methods (e.g. rule-based, SMT, NMT) for low-resource languages - Pivot MT for low-resource languages - Zero-shot MT for low-resource languages - Fast building of MT systems for low-resource languages - Re-usability of existing MT systems for low-resource languages - Machine translation for language preservation SUBMISSION INFORMATION We are soliciting two types of submissions: (1) research, review, and position papers and (2) system demonstration papers. For research, review and position papers, the length of each paper should be at least four (4) and not exceed eight (8) pages, plus unlimited pages for references. For system demonstration papers, the limit is four (4) pages. Submissions should be formatted according to the official ACL 2024 style templates. Accepted papers will be published online in the ACL 2024 proceedings and will be presented at the conference. Submissions must be anonymized and should be done using the provided submission system. Scientific papers that have been or will be submitted to other venues must be declared as such and must be withdrawn from the other venues if accepted and published at LoResMT. The review will be double-blind. Authors of an accepted paper should present their paper in person at ACL 2024. Papers should be submitted in PDF to the LoResMT Open Review. We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided. Registration is handled by the main conference (https://2024.aclweb.org/). ORGANIZING COMMITTEE (LISTED ALPHABETICALLY) Atul Kr. Ojha, University of Galway & Panlingua Language Processing LLP Chao-Hong Liu, Potamu Research Ltd Ekaterina Vylomova, University of Melbourne, Australia Jade Abbott, Retro Rabbit Jonathan Washington, Swarthmore College Nathaniel Oco, National University (Philippines) Tommi A Pirinen, UiT The Arctic University of Norway, Tromsø Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University Varvara Logacheva, Skolkovo Institute of Science and Technology Xiaobing Zhao, Minzu University of China PROGRAM COMMITTEE (LISTED ALPHABETICALLY) Abigail Walsh, ADAPT Centre, Dublin City University, Ireland Alberto Poncelas, Rakuten, Singapore Alina Karakanta, Leiden University Amirhossein Tebbifakhr, Fondazione Bruno Kessler Anna Currey, Amazon Web Services Aswarth Abhilash Dara, Amazon Arturo Oncevay, University of Edinburgh Atul Kr. Ojha, DSI, University of Galway & Panlingua Language Processing LLP Barry Haddow, University of Edinburgh Bogdan Babych, Heidelberg University Chao-Hong Liu, Potamu Research Ltd Constantine Lignos, Brandeis University, USA Daan van Esch, Google Diptesh Kanojia, University of Surrey, UK Duygu Ataman, University of Zurich Ekaterina Vylomova, University of Melbourne, Australia Eleni Metheniti, CLLE-CNRS and IRIT-CNRS Flammie Pirinen, UiT The Arctic University of Norway, Tromsø Koel Dutta Chowdhury, Saarland University (Germany) Jade Abbott, Retro Rabbit Jasper Kyle Catapang, University of the Philippines Jindřich Libovicky, Charles University John P. McCrae, DSI, University of Galway Liangyou Li, Noah’s Ark Lab, Huawei Technologies Majid Latifi, University of York, York, UK Maria Art Antonette Clariño, University of the Philippines Los Baños Mathias Müller, University of Zurich Nathaniel Oco, De La Salle University (Philippines) Rajdeep Sarkar, Yahoo Rico Sennrich, University of Zurich Saliha Muradoglu, The Australian National University Sangjee Dondrub, Qinghai Normal University Santanu Pal, WIPRO AI Sardana Ivanova, University of Helsinki Shantipriya Parida, Silo AI Sunit Bhattacharya, Charles University Surafel Melaku Lakew, Amazon AI Wen Lai, Center for Information and Language Processing, LMU Munich Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University CONTACT Please email loresmt(a)googlegroups.com if you have any questions/comments/suggestions.

1 0

2026

2025

2024

2023

2022

Corpora