SIGUL January 2025

sigul@list.elra.info

6 participants
10 discussions

Third and Final CFP: 21st Workshop on Multiword Expressions (MWE 2025) @NAACL2025
by Atul K. Ojha 29 Jan '25

29 Jan '25

Apologies for cross-postings ******************************************************************************** Final Call for Papers 21st Workshop on Multiword Expressions (MWE 2025) Organized, sponsored and endorsed by SIGLEX, the Special Interest Group on the Lexicon of the ACL Full-day workshop collocated with NAACL 2025, Albuquerque, New Mexico, U.S.A., May 3 or 4, 2025 Hybrid (on-site & on-line) Submission deadline:* February 13, 2025* MWE 2025 website: <https://multiword.org/mwe2022/> https://multiword.org/mwe2025/ ******************************************************************************** Multiword expressions (MWEs), i.e., word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin and Kim, 2010), such as “by and large”, “hot dog”, “make a decision” and “break one's leg” are still a pain in the neck for Natural Language Processing (NLP). The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalized phrases, etc. Given their irregular nature, MWEs often pose complex problems in linguistic modeling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and Machine Translation), hence still representing an open issue for computational linguistics (Constant et al., 2017). For more than two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section <https://multiword.org/> of ACL-SIGLEX <http://www.siglex.org/> in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 21st edition of the workshop, we identified the following topics on which contributions are particularly encouraged: - MWE processing to enhance end-user applications. MWEs gained particular attention in end-user applications, including Machine Translation (MT) (Zaninello and Birch, 2020), simplification (Kochmar et al., 2020), language learning and assessment (Paquot et al., 2020), social media mining (Pelosi et al., 2017), and abusive language detection (Zampieri et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications. - MWE processing and identification in the general language, as well as in specialized languages and domains: Multiword terminology extraction from domain-specific corpora (Lossio-Ventura et al, 2014) is of particular importance to various applications, such as MT (Semmar and Laib, 2017), or for the identification and monitoring of neologisms and technical jargon (Chatzitheodorou and Kappatos, 2021). - MWE processing in low-resource languages: The PARSEME shared tasks (2017 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_05_MWE_2…>, 2018 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-M…>, 2020 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_02_MWE-L…>) among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures, and tools that now allow fully integrating MWE identification into end-user applications. There are continuous efforts in this direction (Diaz Hernandez, 2024) and a few of them have also explored methods for the automatic interpretation of MWEs (Bhatia et al., 2018), and their processing in low-resource languages (Eder et al., 2021). Resource creation and sharing should be pursued in parallel with the development of multilingual benchmarks for MWE identification (Savary et al., 2023). - MWE identification and interpretation in LLMs: Most current MWE processing is limited to their identification and detection using pre-trained language models, but we still lack understanding about how MWEs are represented and dealt with therein (Garcia et al., 2021), how to better model the compositionality of MWEs from semantics (Phelps et al., 2024). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled (Shwartz and Dagan, 2019). - New and enhanced representation of MWEs in language resources and computational models of compositionality as gold standards for formative intrinsic evaluation. Through this workshop, we will bring together and encourage researchers in various NLP subfields to submit their MWE-related research, We also intend to consolidate the converging results of previous joint workshops LAW-MWE-CxG 2018 <http://multiword.sourceforge.net/lawmwecxg2018/>, MWE-WN 2019 <http://multiword.sourceforge.net/mwewn2019/> and MWE-LEX 2020 <http://multiword.sourceforge.net/mwelex2020/>, the joint MWE-WOAH panel in 2021 <https://multiword.org/mwe2021/#program>, the MWE-SIGUL 2022 joint session <https://multiword.org/mwe2022/>, and the MWE-UD 2024 <https://multiword.org/mweud2024/>, extending our scope to MWEs in e-lexicons, and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in: - Computationally-applicable theoretical work in psycholinguistics and corpus linguistics; - Annotation (expert, crowdsourcing, automatic) and representation in resources such as corpora, treebanks, e-lexicons, WordNets, constructions (also for low-resource languages); - Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.); - Discovery and identification methods, including for specialized languages and domains such as clinical or biomedical NLP; - Interpretation of MWEs and understanding of text containing them; - Language acquisition, language learning, and non-standard language (e.g. tweets, speech); - Evaluation of annotation and processing techniques; - Retrospective comparative analyses from the PARSEME shared tasks; - Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.); - Implicit and explicit representation in pre-trained language models and end-user applications; - Evaluation and probing of pre-trained language models; - Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications; - Multiword terminology extraction; - Adaptation and transfer of annotations and related resources to new languages and domains including low-resource ones. Submission formats: The workshop invites two types of submissions: - archival submissions that present substantially original research in both long paper format (8 pages + references) and short paper format (4 pages + references). - non-archival submissions of abstracts describing relevant research presented/published elsewhere which will not be included in the MWE proceedings. Paper submission and templates Papers should be submitted via the workshop's submission page <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/MWE> ( https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/MWE). Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL stylesheet <https://github.com/acl-org/acl-style-files>. Important Dates Paper Submission Deadline: February 13, 2025 Notification of acceptance: March 8, 2025 Camera-ready papers due: March 17, 2025 Workshop: May 3 or 4, 2025 All deadlines are at 23:59 UTC-12 (Anywhere on Earth). Organizing Committee Verginica Barbu Mititelu, Voula Giouli, Grazina Korvel, A. Seza Doğruöz, Alexandre Rademaker, Atul Kr. Ojha, Mathieu Constant Anti-harassment policy The workshop follows the ACL anti-harassment policy <https://www.aclweb.org/adminwiki/index.php?title=Anti-Harassment_Policy>. Contact For any inquiries regarding the workshop, please send an email to the Organizing Committee at mwe2025workshop(a)gmail.com.

1 0

Third and final CFP: LoResMT 2025 at NAACL 2025
by Atul K. Ojha 29 Jan '25

29 Jan '25

Apologies for cross-posting. --------------------------------------------------------------------------- *The Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)* *https://www.loresmt.org/ <https://www.loresmt.org/>* *@ NAACL 2025 (May 3–4, 2025)* *Albuquerque, New Mexico, U.S.A.* *SUBMISSION* * <https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/LoResMT>https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT>* *TIMELINE* *Paper submission due:* *February 13, 2025* (Anywhere on Earth) *Pre-reviewed (ARR) submission deadline:* *February 27, 2025* *Notification of acceptance:* March 8, 2025 *Camera-ready papers due:* March 17, 2025 (Anywhere on Earth) *Pre-recorded video due (hard deadline):* April 8, 2025 *Workshop dates at NAACL 2025:* May 3–4, 2025 *SCOPE* Based on the success of past low-resource machine translation (MT) workshops at AMTA 2018, MT Summit 2019, AACL-IJCNLP 2020, AMTA 2021, COLING 2022, EACL 2023, ACL 2024, we introduce LoResMT 2025 workshop at NAACL 2025. The workshop provides a discussion panel for researchers working on MT systems/methods for low-resource and under-represented languages in general. We would like to help review/overview the state of MT for low-resource languages and define the most important directions. We also solicit papers dedicated to supplementary NLP tools that are used in any language and especially in low-resource languages. Overview papers of these NLP tools are very welcome. It will be beneficial if the evaluations of these tools in research papers include their impact on the quality of MT output. *TOPICS* We are highly interested in (1) original research papers, (2) review/opinion papers, and (3) online systems on the topics below; however, we welcome all novel ideas that cover research on low-resource languages. - Neural machine translation (NMT) for low-resource languages - Use of LLMs (large language models) for low-resource MT systems - COVID-related corpora, their translations and corresponding NLP/MT systems - Work that presents online systems for practical use by native speakers - Word tokenizers/de-tokenizers for specific languages - Word/morpheme segmenters for specific languages - Alignment/Re-ordering tools for specific language pairs - Use of morphology analyzers and/or morpheme segmenters in MT - Multilingual/cross-lingual NLP tools for MT - Corpora creation and curation technologies for low-resource languages - Review of available parallel corpora for low-resource languages - Research and review papers on MT methods for low-resource languages - MT systems/methods (e.g. rule-based, SMT, NMT) for low-resource languages - Pivot MT for low-resource languages - Zero-shot MT for low-resource languages - Fast building of MT systems for low-resource languages - Re-usability of existing MT systems for low-resource languages - Machine translation for language preservation *SUBMISSION INFORMATION* We are soliciting two types of submissions: (1) research, review, and position papers and (2) system demonstration papers. For research, review and position papers, the length of each paper should be at least four (4) and not exceed eight (8) pages, plus unlimited pages for references. For system demonstration papers, the limit is four (4) pages. Submissions should be formatted according to the official ACL style templates (Overleaf). Please refer to the NAACL submission guideline for further information <https://2025.naacl.org/calls/papers/#paper-submission-details>. Accepted papers will be published at ACL Anthology in the NAACL 2025 and will be presented at the conference. Submissions must be anonymized and should be done using the provided submission system. Scientific papers that have been or will be submitted to other venues must be declared as such and must be withdrawn from the other venues if accepted and published at LoResMT. The review will be double-blind. Authors of an accepted paper should present their paper in person at NAACL 2025. Papers should be submitted in PDF to the LoResMT Open Review <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT>. We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided. Registration is handled by the main conference (https://2025.naacl.org/). *ORGANIZING COMMITTEE (LISTED ALPHABETICALLY)* Atul Kr. Ojha, University of Galway Chao-Hong Liu, Potamu Research Ltd Ekaterina Vylomova, University of Melbourne, Australia Jonathan Washington, Swarthmore College Nathaniel Oco, National University (Philippines) Flammie Pirinen, UiT The Arctic University of Norway, Tromsø Xiaobing Zhao, Minzu University of China *PROGRAM COMMITTEE (LISTED ALPHABETICALLY)* Abigail Walsh, ADAPT Centre, Dublin City University, Ireland Alberto Poncelas, Rakuten, Singapore Ali Hatami, University of Galway Alina Karakanta, Fondazione Bruno Kessler (FBK), University of Trento Anna Currey, AWS AI Labs Aswarth Abhilash Dara, Walmart Global Technology Atul Kr. Ojha, University of Galway & Panlingua Language Processing LLP Bogdan Babych, Heidelberg University Chao-hong Liu, Potamu Research Ltd Constantine Lignos, Brandeis University, USA Daan van Esch, Google Dana Moukheiber, Massachusetts Institute of Technology Ekaterina Vylomova, University of Melbourne, Australia Eleni Metheniti, CLLE-CNRS and IRIT-CNRS Flammie Pirinen, UiT Norgga árktalaš universitehta Gaurav Negi, University of Galway Jinliang Lu, Institute of automation, Chinese Academy of Sciences John Philip McCrae, University of Galway Jonathan Washington, Swarthmore College Koel Dutta Chowdhury, Saarland University Majid Latifi, UPC University Maria Art Antonette Clariño, University of the Philippines Los Baños Milind Agarwal, George Mason University Mathias Müller, University of Zurich Nathaniel Oco, De La Salle University Pavel Rychlý, Masaryk University and Lexical Computing Pengwei Li, Meta Rashid Ahmad, International Institute of Information Technology, Hyderabad Rico Sennrich, University of Zurich Santanu Pal, Wipro Sangjee Dondrub, Qinghai Normal University Sardana Ivanova, University of Helsinki Sourabrata Mukherjee, Charles University Thepchai Supnithi, National Electronics and Computer Technology Center Timothee Mickus, University of Helsinki Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University Wen Lai, LMU Munich Xuebo Liu, Harbin Institute of Technolgy, Shenzhen Yalemisew Abgaz, Dublin City University Yasmin Moslem, Bering Lab Zhanibek Kozhirbayev, National Laboratory Astana, Nazarbayev University *CONTACT* Please email loresmt(a)googlegroups.com if you have any questions/comments/suggestions.

1 0

Second CFP: 21st Workshop on Multiword Expressions (MWE 2025) @NAACL2025
by Atul K. Ojha 22 Jan '25

22 Jan '25

Apologies for cross-postings] ******************************************************************************** Second Call for Papers 21st Workshop on Multiword Expressions (MWE 2025) Organized, sponsored and endorsed by SIGLEX, the Special Interest Group on the Lexicon of the ACL Full-day workshop collocated with NAACL 2025, Albuquerque, New Mexico, U.S.A., May 3 or 4, 2025 Hybrid (on-site & on-line) Submission deadline: January 30, 2025 MWE 2025 website: <https://multiword.org/mwe2022/> https://multiword.org/mwe2025/ ******************************************************************************** Multiword expressions (MWEs), i.e., word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin and Kim, 2010), such as “by and large”, “hot dog”, “make a decision” and “break one's leg” are still a pain in the neck for Natural Language Processing (NLP). The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalized phrases, etc. Given their irregular nature, MWEs often pose complex problems in linguistic modeling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and Machine Translation), hence still representing an open issue for computational linguistics (Constant et al., 2017). For more than two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section <https://multiword.org/> of ACL-SIGLEX <http://www.siglex.org/> in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 21st edition of the workshop, we identified the following topics on which contributions are particularly encouraged: - MWE processing to enhance end-user applications. MWEs gained particular attention in end-user applications, including Machine Translation (MT) (Zaninello and Birch, 2020), simplification (Kochmar et al., 2020), language learning and assessment (Paquot et al., 2020), social media mining (Pelosi et al., 2017), and abusive language detection (Zampieri et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications. - MWE processing and identification in the general language, as well as in specialized languages and domains: Multiword terminology extraction from domain-specific corpora (Lossio-Ventura et al, 2014) is of particular importance to various applications, such as MT (Semmar and Laib, 2017), or for the identification and monitoring of neologisms and technical jargon (Chatzitheodorou and Kappatos, 2021). - MWE processing in low-resource languages: The PARSEME shared tasks (2017 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_05_MWE_2…>, 2018 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-M…>, 2020 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_02_MWE-L…>) among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures, and tools that now allow fully integrating MWE identification into end-user applications. There are continuous efforts in this direction (Diaz Hernandez, 2024) and a few of them have also explored methods for the automatic interpretation of MWEs (Bhatia et al., 2018), and their processing in low-resource languages (Eder et al., 2021). Resource creation and sharing should be pursued in parallel with the development of multilingual benchmarks for MWE identification (Savary et al., 2023). - MWE identification and interpretation in LLMs: Most current MWE processing is limited to their identification and detection using pre-trained language models, but we still lack understanding about how MWEs are represented and dealt with therein (Garcia et al., 2021), how to better model the compositionality of MWEs from semantics (Phelps et al., 2024). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled (Shwartz and Dagan, 2019). - New and enhanced representation of MWEs in language resources and computational models of compositionality as gold standards for formative intrinsic evaluation. Through this workshop, we will bring together and encourage researchers in various NLP subfields to submit their MWE-related research, We also intend to consolidate the converging results of previous joint workshops LAW-MWE-CxG 2018 <http://multiword.sourceforge.net/lawmwecxg2018/>, MWE-WN 2019 <http://multiword.sourceforge.net/mwewn2019/> and MWE-LEX 2020 <http://multiword.sourceforge.net/mwelex2020/>, the joint MWE-WOAH panel in 2021 <https://multiword.org/mwe2021/#program>, the MWE-SIGUL 2022 joint session <https://multiword.org/mwe2022/>, and the MWE-UD 2024 <https://multiword.org/mweud2024/>, extending our scope to MWEs in e-lexicons, and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in: - Computationally-applicable theoretical work in psycholinguistics and corpus linguistics; - Annotation (expert, crowdsourcing, automatic) and representation in resources such as corpora, treebanks, e-lexicons, WordNets, constructions (also for low-resource languages); - Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.); - Discovery and identification methods, including for specialized languages and domains such as clinical or biomedical NLP; - Interpretation of MWEs and understanding of text containing them; - Language acquisition, language learning, and non-standard language (e.g. tweets, speech); - Evaluation of annotation and processing techniques; - Retrospective comparative analyses from the PARSEME shared tasks; - Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.); - Implicit and explicit representation in pre-trained language models and end-user applications; - Evaluation and probing of pre-trained language models; - Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications; - Multiword terminology extraction; - Adaptation and transfer of annotations and related resources to new languages and domains including low-resource ones. Submission formats: The workshop invites two types of submissions: - archival submissions that present substantially original research in both long paper format (8 pages + references) and short paper format (4 pages + references). - non-archival submissions of abstracts describing relevant research presented/published elsewhere which will not be included in the MWE proceedings. Paper submission and templates Papers should be submitted via the workshop's submission page <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/MWE> ( https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/MWE). Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL stylesheet <https://github.com/acl-org/acl-style-files>. Important Dates Paper Submission Deadline: January 30, 2025 Notification of acceptance: March 1, 2025 Camera-ready papers due: March 10, 2025 Workshop: May 3 or 4, 2025 All deadlines are at 23:59 UTC-12 (Anywhere on Earth). Organizing Committee Verginica Barbu Mititelu, Voula Giouli, Grazina Korvel, A. Seza Doğruöz, Alexandre Rademaker, Atul Kr. Ojha, Mathieu Constant Anti-harassment policy The workshop follows the ACL anti-harassment policy <https://www.aclweb.org/adminwiki/index.php?title=Anti-Harassment_Policy>. Contact For any inquiries regarding the workshop, please send an email to the Organizing Committee at mwe2025workshop(a)gmail.com.

1 0

Several PhD positions in computational linguistics and other language subjects, Uppsala Unviersity
by Sara Stymne 22 Jan '25

22 Jan '25

We offer several fully funded four-year PhD positions at the Language Faculty at Uppsala University. One position is in Computational Linguistics, with a specialization in Nordic Languages. This position requires knowledge of a Scandinavian language and will be carried out as part of the research project "Language change and non-fictional texts – a large-scale investigation of Late Modern Swedish (1800–1950)”, led by Sara Stymne and David Håkansson One PhD position in computational linguistics with a specialization in Scandinavian languages at the Department of Linguistics and Philology, UFV-PA 2024/4415<https://uu.varbi.com/en/what:job/jobID:781989/> Several positions are focused on projects related to linguistic diversity and are open to students in Computational Linguistics, Linguistics, as well as other language subjects. Five PhD positions on the theme of linguistic diversity at the Department of Linguistics and Philology, UFV-PA 2024/4412<https://uu.varbi.com/en/what:job/jobID:781937/> One PhD position on the theme of linguistic diversity within any research environment at the faculty, UFV-PA 2025/18<https://uu.varbi.com/en/what:job/jobID:785710/> There are also several positions in several other language subjects. https://www.uu.se/en/about-uu/join-us/jobs-and-vacancies/job-details?query=… Application for all positions closes on March 3. Best, Sara När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

1 0

Conference LT4All 2025 - Unesco Hq Paris - Feb. 24-26 2025 / on-site participation registration
by Claudia Soria 21 Jan '25

21 Jan '25

Dear colleagues, FYI although on-site attendance is limited to 400 because of the size of the conference room, registration for on site attendance is still open. Contact: lt4all2025-contact(a)ml.naist.ac.jp Further details can be found at https://www.lt4all2025.eu/ - LT4All 2.0 Advancing Humanism through Language Technologies 24-26 February 2025, UNESCO Headquarters, Paris, France ❖ Language Technologies (LT), nurtured in research laboratories for half a century, are now spreading widely across numerous applications. However, the situation varies significantly among the more than 7,500 languages spoken worldwide. ❖ The first LT4All (LT4All 1.0) conference in 2019 highlighted the critical role of multilingualism in cutting-edge technology. This spurred significant initiatives by various research institutions and major technological companies toward developing language technologies for a wider range of languages. ❖ Despite significant progress, many communities are still being left behind. The critical issue lies not just in creating language technologies for numerous languages but also in collaborating with communities to develop the solutions they need Organized within the framework of the International Decade of Indigenous Languages (IDIL 2022-2032) and to commemorate the Silver Jubilee of International Mother Language Day 2025, the second edition of LT4All (LT4All 2.0) aims to further the agenda of language technologies with a focus on community empowerment. The goal is to harness technology not only to advance itself but also to support and enhance individuals' capabilities. The conference is organized by the international Language Resources Association (ELRA) and its Special Interest Group on Under-resourced languages (SIGUL), a joint SIG of ELRA and of the International Speech Communication Association (ISCA), in partnership with UNESCO. Further details can be found at https://www.lt4all2025.eu/ Contact: lt4all2025-contact(a)ml.naist.ac.jp For the LT4ALL organization committee Claudia Soria facebook <https://www.facebook.com/CNRsocialFB> twitter <https://twitter.com/CNRsocial_> instagram <https://www.instagram.com/cnrsocial/> linkedin <https://www.linkedin.com/company/283032> Claudia Soria CNR, ISTITUTO DI LINGUISTICA COMPUTAZIONALE "ANTONIO ZAMPOLLI" claudia.soria(a)ilc.cnr.it Tel. 0503153166 Via Giuseppe Moruzzi, 1, 56124 – Pisa www.ilc.cnr.it *www.cnr.it* <http://www.cnr.it/> Devolvi il 5×1000 al CNR CF 80054330586

1 0

First CFP: International Asian Language Processing (IALP 2025)
by Sarah Flora S. Juan 21 Jan '25

21 Jan '25

Dear list members, On behalf of the organizing committee, I would like to invite you and your colleagues to submit papers or poster abstracts to the 29th International Conference on Asian Language Processing (IALP). IALP 2025 will be jointly organized by the Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak (UNIMAS), and the Chinese and Oriental Languages Information Processing Society (COLIPS). The conference will take place from 4–6 August 2025 at the Borneo Cultures Museum, Kuching, Sarawak in Malaysia. The International Conference on Asian Language Processing (IALP) is the flagship event of COLIPS, uniquely focused on advancing research in Asian language processing. As a recurring conference series, IALP brings together researchers from diverse linguistic disciplines to foster the development of science and technology in all areas of Asian language processing. By providing a collaborative platform, the conference facilitates knowledge exchange and the exploration of the latest innovations in the field. All accepted papers will be submitted for potential inclusion in IEEE proceedings, subject to fulfilling IEEE’s quality standards. We welcome research papers on techniques, methodologies, and approaches that include the following but not limited to: A. Speech: • Spoken language processing • Spoken language understanding • Spoken language generation • Spoken language translation • Speech recognition and synthesis • Rich transcription and spoken information retrieval • Multimodal representations and processing • Speaker diariazation and speech enhancement • Speaker recognition and anti-spoofing • Trustworthy speech technology B. Natural Language Processing (NLP): • Dialogue and interactive systems • Evaluation methods and user studies • Information extraction, retrieval, and text mining • Interpretability and analysis of models for NLP • Language modeling and statistical methods for NLP • Machine learning for Natural Language Processing • Machine translation and multilingual processing • NLP in vertical domains, such as biomedical, chemical, and legal text • NLP on noisy unstructured text, such as email, blogs, and SMS • Natural language applications, tools, and resources • Question answering • Sentiment analysis, stylistic analysis, and argument mining • Tagging, chunking, and parsing • Text entailment, paraphrasing, generation • Large language models • Text and speech resource development C. Linguistics: • Asian language input, output, coding, etc. • Computational linguistics and mathematical linguistics • Discourse and pragmatics • Language learning, teaching, and computer-aided language learning • Lexical semantics, sentence-level semantics, and textual inference • Linguistic theories, cognitive modeling, and psycholinguistics • Phonology, morphology, and word segmentation • Special hardware and software for Asian language computing To submit your paper, please use the following link: https://cmt3.research.microsoft.com/User/Login?ReturnUrl=%2FIALP2025. You can also visit the conference website at www.ialp2025.org for detailed submission instructions. The deadline for full paper and poster abstract submissions is *31 March 2025*. We are excited to receive your submissions and welcome you to IALP 2025. Should you have any questions or concerns, please do not hesitate to contact us at ialp(a)unimas.my. Thank you, and we hope to see you at the Borneo Cultures Museum! *Sarah Samson Juan* Senior Lecturer/Deputy Dean of Industry and Community Engagement Faculty of Computer Science and Information Technology Universiti Malaysia Sarawak Kota Samarahan 94300 Sarawak, MALAYSIA Email: sjsflora(a)unimas.my / sarah.f.juan(a)gmail.com Website: https://www.fcsit.unimas.my/ Expert: https://expert.unimas.my/profile/1319

1 0

2nd Reminder: Call for participation in Web Survey on Data Bottlenecks in Supervised NLP
by Romberg, Julia 13 Jan '25

13 Jan '25

++ 2nd reminder to participate in our web survey on data annotation bottlenecks and active learning; apologies for cross-posting ++ Dear list members, We invite you to participate in our web survey exploring how recent advancements in NLP, such as LLMs, have changed the need for labeled data in Supervised Machine Learning. Survey details: * Topic: Web survey on Data Annotation and Active Learning * Target group: Researchers and practitioners alike in the fields of NLP, Supervised Machine Learning, and Active Learning in particular (knowledge of Active Learning is not required) * Duration: 5-15 minutes * Deadline for participation: January 12 26, 2025 * Survey link: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271 Why should I invest my time in this survey? * Make an impact: Participate in a community-effort and help to gain a better understanding of the current state and open issues on methods that are used to overcome a lack of labeled data. * Gain insights: Receive a report with key findings to incorporate these insights into research and development of new methods and technologies. Thank you for considering participating in our survey! If you have any questions or require additional information, please don't hesitate to contact us directly at activelearningsurvey2024(a)gmail.com<mailto:activeLearningSurvey2024@gmail.com>. If you know colleagues or peers who might be interested, we'd be grateful if you could forward this survey to them as well. Best regards, Julia Romberg (GESIS - Leibniz Institute for the Social Sciences, Germany) Christopher Schröder (Institut für Angewandte Informatik e. V., Germany) Julius Gonsior (TUD Dresden University of Technology) ------------------------------------------------------------------------ [gesis-logo-new-50-50] Leibniz Institute for the Social Sciences Julia Romberg Computational Social Science, Team Data Science Methods +49(221)47694-742

1 0

Call for Participation: The First Workshop on Language Models for Low-Resource Languages (LoResLM 2025@COLING)
by Ranasinghe, Tharindu 13 Jan '25

13 Jan '25

Neural language models have revolutionised natural language processing (NLP) and have provided state-of-the-art results for many tasks. However, their effectiveness is largely dependent on the pre-training resources. Therefore, language models (LMs) often struggle with low-resource languages in both training and evaluation. Recently, there has been a growing trend in developing and adopting LMs for low-resource languages. LoResLM aims to provide a forum for researchers to share and discuss their ongoing work on LMs for low-resource languages. LoResLM 2025 will be a physical workshop co-located with COLING 2025, Abu Dhabi on 20th January 2025. We are pleased to share the programme of LoResLM 2025 with you. Please visit https://loreslm.github.io/program for the full programme. To register for the workshop, please visit https://coling2025.org/registration/ We are looking forward to welcoming you at LoResLM 2025 in Abu Dhabi. The workshop is supported in part by CLARIN-UK, funded by the Arts and Humanities Research Council as part of the Infrastructure for Digital Arts and Humanities programme. >> Keynote Speaker Jose Camacho-Collados, Cardiff University. Title - "Multilinguality and Cultural Awareness in Language Models" >> Organising Committee Hansi Hettiarachchi, Lancaster University, UK Tharindu Ranasinghe, Lancaster University, UK Paul Rayson, Lancaster University, UK Ruslan Mitkov, Lancaster University, UK Mohamed Gaber, Birmingham City University, UK Damith Premasiri, Lancaster University, UK Fiona Anting Tan, National University of Singapore, Singapore Lasitha Uyangodage, University of Münster, Germany >> Programme Committee Gábor Bella - IMT Atlantique, France Samuel Cahyawijaya - The Hong Kong University of Science and Technology, Hong Kong Burcu Can - University of Stirling, UK Çağrı Çöltekin - University of Tübingen, Germany Raj Dabre - National Institute of Information and Communications Technology, Japan Vera Danilova - Uppsala University, Sweden Debashish Das - Birmingham City University, UK Ona de Gibert - University of Helsinki, Finland Alphaeus Dmonte - George Mason University, USA Bonaventure F. P. Dossou - McGill University, Canada Daan van Esch - Google Ignatius Ezeani - Lancaster University, UK Anna Furtado - University of Galway, Ireland Amal Htait - Aston University, UK Ali Hürriyetoğlu - Wageningen University & Research, Netherlands Danka Jokic - University of Belgrade, Serbia Diptesh Kanojia - University of Surrey, UK Daisy Lal - Lancaster University, UK Colin Leong - University of Dayton, USA Veronika Lipp - Hungarian Research Centre for Linguistics, Hungary Muhidin Mohamed - Aston University, UK Farhad Nooralahzadeh - University of Zurich, Switzerland Rrubaa Panchendrarajan - Queen Mary University of London, UK Nadeesha Pathirana - Aston University, UK Alistair Plum - University of Luxembourg, Luxembourg Nishat Raihan - George Mason University, USA Omid Rohanian - University of Oxford, UK Sandaru Seneviratne - Australian National University, Australia Ravi Shekhar - University of Essex, UK Archchana Sindhujan - University of Surrey, UK Claytone Sikasote - University of Cape Town, South Africa Marjana Prifti Skenduli - University of New York Tirana, Albania Uthayasanker Thayasivam - University of Moratuwa, Sri Lanka Taro Watanabe - Nara Institute of Science and Technology, Japan Edlira Vakaj - Birmingham City University, UK John Vidler - Lancaster University, UK Phil Weber - Aston University, UK Bryan Wilie - Hong Kong University of Science & Technology, Hong Kong Artūrs Znotiņš - University of Latvia, Latvia URL - https://loreslm.github.io/ Twitter - https://x.com/LoResLM2025 LinkedIn - https://www.linkedin.com/company/loreslm/

1 0

Second Call for Participation- IWSLT 2025
by Atul K. Ojha 13 Jan '25

13 Jan '25

Apologies for cross-posting. ---------------------------------------- *The International Conference on Spoken Language Translation* ACL – 22nd* IWSLT 2025 – **S**econd** Call for Participation* *31 July-1 August 2025 - Vienna, Austria* http://iwslt.org The International Conference on Spoken Language Translation (IWSLT) <https://iwslt.org/> is the premier annual conference for all aspects of Spoken Language Translation. Every year, the conference organises and sponsors open evaluation campaigns around key challenges in simultaneous and consecutive translation, under real-time/low latency or offline conditions and under low-resource or multilingual constraints. System descriptions and results from participants’ systems and scientific papers related to key algorithmic advances and best practices are presented. IWSLT is the venue of the SIGSLTs <https://iwslt.org/sigslt/>, the Special Interest Group on Spoken Language Translation <https://iwslt.org/sigslt/> of ACL <https://www.aclweb.org/portal/>, ISCA <https://www.isca-speech.org/> and ELRA <https://www.elra.info/>. With a track record of 21 years, IWSLT benchmarks and proceedings serve as reference for all researchers and practitioners working on speech translation and related fields. The 22nd edition of IWSLT will be run as a hybrid ELRA <https://www.elra.info/>/ACL <https://www.aclweb.org/portal/> event, co-located with ACL 2025 <https://2025.aclweb.org/> from 31 July to 1 August 2025. *Important Dates* *January 1, 2025*: Release of shared task training and dev data *March 15, 2025*: Scientific paper submission deadline *Apr 1-15, 2025*: Evaluation period *April 21, 2025*: System description paper submission deadline *May 15, 2025*: Notification of acceptance *June 1, 2025*: Camera-ready deadline (all paper) *July 31-Aug 1*, *2025*: IWSLT conference Evaluation The IWSLT 2025 features shared tasks <https://iwslt.org/2025/#shared-tasks> that address the following focus areas: - High-resource ST: Offline track, Simultaneous track, Subtitling track - Low-resource ST: Low-resource and Indic (multilingual) tracks - Instruction-following Speech Processing track: Technical domain ST, ASR, Summarization, and QA Training and development data for each shared task will be prepared and released by the respective organisers (for further information on this initiative, please refer to the IWSLT website <https://iwslt.org/2025/>). Participants will receive instructions about how to submit their runs. In addition, participants have the opportunity to present their work through a system paper that will be published in the ACL Proceedings. Conference IWSLT also invites submissions of scientific papers to be published in the ACL Proceedings and presented either in oral or poster format. The conference selects high-quality, original contributions on theoretical and practical issues of spoken language translation research, technologies and applications. Submissions will be accepted directly through the IWSLT submission site (to be announced on the website <https://iwslt.org/2025/>). We will also accept commitments of submissions with reviews from the ACL Rolling Review. Additionally, to foster cross-pollination of ideas, the conference also invites the presentation of papers on speech translation recently published elsewhere. Please note that this is for non-archival presentation of papers relevant to speech translation already published in other venues (e.g., Findings for the *ACL, speech, NLP or MT conferences). Submissions for this category will be accepted through a dedicated form (to be announced on the website <https://iwslt.org/2025/>). Papers will be checked for relevance to IWSLT, and assigned either oral or poster presentation slots if selected. Contact Please email iwslt-evaluation-campaign(a)googlegroups.com if you have any questions related to the shared tasks. Thanks, Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul (IWSLT organisers)

1 0

First CFP: LoResMT 2025 at NAACL 2025
by Atul K. Ojha 13 Jan '25

13 Jan '25

Apologies for cross-posting. --------------------------------------------------------------------------- *The Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)* *https://www.loresmt.org/ <https://www.loresmt.org/>* *@ NAACL 2025 (May 3–4, 2025)* *Albuquerque, New Mexico, U.S.A.* *SUBMISSION* * <https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/LoResMT>https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT>* *TIMELINE* *Paper submission due:* January 30, 2025 (Anywhere on Earth) *Pre-reviewed (ARR) submission deadline:* February 20, 2025 *Notification of acceptance:* March 1, 2025 *Camera-ready papers due:* March 10, 2025 (Anywhere on Earth) *Pre-recorded video due (hard deadline):* April 8, 2025 *Workshop dates at NAACL 2025:* May 3–4, 2025 *SCOPE* Based on the success of past low-resource machine translation (MT) workshops at AMTA 2018, MT Summit 2019, AACL-IJCNLP 2020, AMTA 2021, COLING 2022, EACL 2023, ACL 2024, we introduce LoResMT 2025 workshop at NAACL 2025. The workshop provides a discussion panel for researchers working on MT systems/methods for low-resource and under-represented languages in general. We would like to help review/overview the state of MT for low-resource languages and define the most important directions. We also solicit papers dedicated to supplementary NLP tools that are used in any language and especially in low-resource languages. Overview papers of these NLP tools are very welcome. It will be beneficial if the evaluations of these tools in research papers include their impact on the quality of MT output. *TOPICS* We are highly interested in (1) original research papers, (2) review/opinion papers, and (3) online systems on the topics below; however, we welcome all novel ideas that cover research on low-resource languages. - Neural machine translation (NMT) for low-resource languages - Use of LLMs (large language models) for low-resource MT systems - COVID-related corpora, their translations and corresponding NLP/MT systems - Work that presents online systems for practical use by native speakers - Word tokenizers/de-tokenizers for specific languages - Word/morpheme segmenters for specific languages - Alignment/Re-ordering tools for specific language pairs - Use of morphology analyzers and/or morpheme segmenters in MT - Multilingual/cross-lingual NLP tools for MT - Corpora creation and curation technologies for low-resource languages - Review of available parallel corpora for low-resource languages - Research and review papers on MT methods for low-resource languages - MT systems/methods (e.g. rule-based, SMT, NMT) for low-resource languages - Pivot MT for low-resource languages - Zero-shot MT for low-resource languages - Fast building of MT systems for low-resource languages - Re-usability of existing MT systems for low-resource languages - Machine translation for language preservation *SUBMISSION INFORMATION* We are soliciting two types of submissions: (1) research, review, and position papers and (2) system demonstration papers. For research, review and position papers, the length of each paper should be at least four (4) and not exceed eight (8) pages, plus unlimited pages for references. For system demonstration papers, the limit is four (4) pages. Submissions should be formatted according to the official ACL style templates (Overleaf). Please refer to the NAACL submission guideline for further information <https://2025.naacl.org/calls/papers/#paper-submission-details>. Accepted papers will be published at ACL Anthology in the NAACL 2025 and will be presented at the conference. Submissions must be anonymized and should be done using the provided submission system. Scientific papers that have been or will be submitted to other venues must be declared as such and must be withdrawn from the other venues if accepted and published at LoResMT. The review will be double-blind. Authors of an accepted paper should present their paper in person at NAACL 2025. Papers should be submitted in PDF to the LoResMT Open Review <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT>. We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided. Registration is handled by the main conference (https://2025.naacl.org/). *ORGANIZING COMMITTEE (LISTED ALPHABETICALLY)* Atul Kr. Ojha, University of Galway Chao-Hong Liu, Potamu Research Ltd Ekaterina Vylomova, University of Melbourne, Australia Jade Abbott, Retro Rabbit Jonathan Washington, Swarthmore College Nathaniel Oco, National University (Philippines) Tommi A Pirinen, UiT The Arctic University of Norway, Tromsø Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University Varvara Logacheva, Skolkovo Institute of Science and Technology Xiaobing Zhao, Minzu University of China *PROGRAM COMMITTEE (LISTED ALPHABETICALLY)* Abigail Walsh, ADAPT Centre, Dublin City University, Ireland Alberto Poncelas, Rakuten, Singapore Ali Hatami, University of Galway Alina Karakanta, Fondazione Bruno Kessler (FBK), University of Trento Anna Currey, AWS AI Labs Aswarth Abhilash Dara, Walmart Global Technology Atul Kr. Ojha, University of Galway & Panlingua Language Processing LLP Bogdan Babych, Heidelberg University Chao-hong Liu, Potamu Research Ltd Constantine Lignos, Brandeis University, USA Daan van Esch, Google Dana Moukheiber, Massachusetts Institute of Technology Ekaterina Vylomova, University of Melbourne, Australia Eleni Metheniti, CLLE-CNRS and IRIT-CNRS Flammie Pirinen, UiT Norgga árktalaš universitehta Gaurav Negi, University of Galway Jinliang Lu, Institute of automation, Chinese Academy of Sciences John Philip McCrae, University of Galway Jonathan Washington, Swarthmore College Koel Dutta Chowdhury, Saarland University Majid Latifi, UPC University Maria Art Antonette Clariño, University of the Philippines Los Baños Milind Agarwal, George Mason University Mathias Müller, University of Zurich Nathaniel Oco, De La Salle University Pavel Rychlý, Masaryk University and Lexical Computing Pengwei Li, Meta Rashid Ahmad, International Institute of Information Technology, Hyderabad Rico Sennrich, University of Zurich Santanu Pal, Wipro Sangjee Dondrub, Qinghai Normal University Sardana Ivanova, University of Helsinki Sourabrata Mukherjee, Charles University Thepchai Supnithi, National Electronics and Computer Technology Center Timothee Mickus, University of Helsinki Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University Wen Lai, LMU Munich Xuebo Liu, Harbin Institute of Technolgy, Shenzhen Yalemisew Abgaz, Dublin City University Yasmin Moslem, Bering Lab Zhanibek Kozhirbayev, National Laboratory Astana, Nazarbayev University *CONTACT* Please email loresmt(a)googlegroups.com if you have any questions/comments/suggestions.

1 1