SIGUL December 2024

sigul@list.elra.info

6 participants
11 discussions

First CFP: LoResMT 2025 at NAACL 2025
by Atul K. Ojha 13 Jan '25

13 Jan '25

Apologies for cross-posting. --------------------------------------------------------------------------- *The Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)* *https://www.loresmt.org/ <https://www.loresmt.org/>* *@ NAACL 2025 (May 3–4, 2025)* *Albuquerque, New Mexico, U.S.A.* *SUBMISSION* * <https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/LoResMT>https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT>* *TIMELINE* *Paper submission due:* January 30, 2025 (Anywhere on Earth) *Pre-reviewed (ARR) submission deadline:* February 20, 2025 *Notification of acceptance:* March 1, 2025 *Camera-ready papers due:* March 10, 2025 (Anywhere on Earth) *Pre-recorded video due (hard deadline):* April 8, 2025 *Workshop dates at NAACL 2025:* May 3–4, 2025 *SCOPE* Based on the success of past low-resource machine translation (MT) workshops at AMTA 2018, MT Summit 2019, AACL-IJCNLP 2020, AMTA 2021, COLING 2022, EACL 2023, ACL 2024, we introduce LoResMT 2025 workshop at NAACL 2025. The workshop provides a discussion panel for researchers working on MT systems/methods for low-resource and under-represented languages in general. We would like to help review/overview the state of MT for low-resource languages and define the most important directions. We also solicit papers dedicated to supplementary NLP tools that are used in any language and especially in low-resource languages. Overview papers of these NLP tools are very welcome. It will be beneficial if the evaluations of these tools in research papers include their impact on the quality of MT output. *TOPICS* We are highly interested in (1) original research papers, (2) review/opinion papers, and (3) online systems on the topics below; however, we welcome all novel ideas that cover research on low-resource languages. - Neural machine translation (NMT) for low-resource languages - Use of LLMs (large language models) for low-resource MT systems - COVID-related corpora, their translations and corresponding NLP/MT systems - Work that presents online systems for practical use by native speakers - Word tokenizers/de-tokenizers for specific languages - Word/morpheme segmenters for specific languages - Alignment/Re-ordering tools for specific language pairs - Use of morphology analyzers and/or morpheme segmenters in MT - Multilingual/cross-lingual NLP tools for MT - Corpora creation and curation technologies for low-resource languages - Review of available parallel corpora for low-resource languages - Research and review papers on MT methods for low-resource languages - MT systems/methods (e.g. rule-based, SMT, NMT) for low-resource languages - Pivot MT for low-resource languages - Zero-shot MT for low-resource languages - Fast building of MT systems for low-resource languages - Re-usability of existing MT systems for low-resource languages - Machine translation for language preservation *SUBMISSION INFORMATION* We are soliciting two types of submissions: (1) research, review, and position papers and (2) system demonstration papers. For research, review and position papers, the length of each paper should be at least four (4) and not exceed eight (8) pages, plus unlimited pages for references. For system demonstration papers, the limit is four (4) pages. Submissions should be formatted according to the official ACL style templates (Overleaf). Please refer to the NAACL submission guideline for further information <https://2025.naacl.org/calls/papers/#paper-submission-details>. Accepted papers will be published at ACL Anthology in the NAACL 2025 and will be presented at the conference. Submissions must be anonymized and should be done using the provided submission system. Scientific papers that have been or will be submitted to other venues must be declared as such and must be withdrawn from the other venues if accepted and published at LoResMT. The review will be double-blind. Authors of an accepted paper should present their paper in person at NAACL 2025. Papers should be submitted in PDF to the LoResMT Open Review <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT>. We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided. Registration is handled by the main conference (https://2025.naacl.org/). *ORGANIZING COMMITTEE (LISTED ALPHABETICALLY)* Atul Kr. Ojha, University of Galway Chao-Hong Liu, Potamu Research Ltd Ekaterina Vylomova, University of Melbourne, Australia Jade Abbott, Retro Rabbit Jonathan Washington, Swarthmore College Nathaniel Oco, National University (Philippines) Tommi A Pirinen, UiT The Arctic University of Norway, Tromsø Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University Varvara Logacheva, Skolkovo Institute of Science and Technology Xiaobing Zhao, Minzu University of China *PROGRAM COMMITTEE (LISTED ALPHABETICALLY)* Abigail Walsh, ADAPT Centre, Dublin City University, Ireland Alberto Poncelas, Rakuten, Singapore Ali Hatami, University of Galway Alina Karakanta, Fondazione Bruno Kessler (FBK), University of Trento Anna Currey, AWS AI Labs Aswarth Abhilash Dara, Walmart Global Technology Atul Kr. Ojha, University of Galway & Panlingua Language Processing LLP Bogdan Babych, Heidelberg University Chao-hong Liu, Potamu Research Ltd Constantine Lignos, Brandeis University, USA Daan van Esch, Google Dana Moukheiber, Massachusetts Institute of Technology Ekaterina Vylomova, University of Melbourne, Australia Eleni Metheniti, CLLE-CNRS and IRIT-CNRS Flammie Pirinen, UiT Norgga árktalaš universitehta Gaurav Negi, University of Galway Jinliang Lu, Institute of automation, Chinese Academy of Sciences John Philip McCrae, University of Galway Jonathan Washington, Swarthmore College Koel Dutta Chowdhury, Saarland University Majid Latifi, UPC University Maria Art Antonette Clariño, University of the Philippines Los Baños Milind Agarwal, George Mason University Mathias Müller, University of Zurich Nathaniel Oco, De La Salle University Pavel Rychlý, Masaryk University and Lexical Computing Pengwei Li, Meta Rashid Ahmad, International Institute of Information Technology, Hyderabad Rico Sennrich, University of Zurich Santanu Pal, Wipro Sangjee Dondrub, Qinghai Normal University Sardana Ivanova, University of Helsinki Sourabrata Mukherjee, Charles University Thepchai Supnithi, National Electronics and Computer Technology Center Timothee Mickus, University of Helsinki Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University Wen Lai, LMU Munich Xuebo Liu, Harbin Institute of Technolgy, Shenzhen Yalemisew Abgaz, Dublin City University Yasmin Moslem, Bering Lab Zhanibek Kozhirbayev, National Laboratory Astana, Nazarbayev University *CONTACT* Please email loresmt(a)googlegroups.com if you have any questions/comments/suggestions.

1 1

Reminder: Call for participation in Web Survey on Data Bottlenecks in Supervised NLP
by Romberg, Julia 30 Dec '24

30 Dec '24

++ 1st reminder to participate in our web survey on data annotation bottlenecks and active learning; apologies for cross-posting ++ Dear list members, We invite you to participate in our web survey exploring how recent advancements in NLP, such as LLMs, have changed the need for labeled data in Supervised Machine Learning. Survey details: * Topic: Web survey on Data Annotation and Active Learning * Target group: Researchers and practitioners alike in the fields of NLP, Supervised Machine Learning, and Active Learning in particular (knowledge of Active Learning is not required) * Duration: 5-15 minutes * Deadline for participation: January 12, 2025 * Survey link: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271 Why should I invest my time in this survey? * Make an impact: Participate in a community-effort and help to gain a better understanding of the current state and open issues on methods that are used to overcome a lack of labeled data. * Gain insights: Receive a report with key findings to incorporate these insights into research and development of new methods and technologies. Thank you for considering participating in our survey! If you have any questions or require additional information, please don't hesitate to contact us directly at activelearningsurvey2024(a)gmail.com<mailto:activeLearningSurvey2024@gmail.com>. If you know colleagues or peers who might be interested, we'd be grateful if you could forward this survey to them as well. Best regards, Julia Romberg (GESIS - Leibniz Institute for the Social Sciences, Germany) Christopher Schröder (Institut für Angewandte Informatik e. V., Germany) Julius Gonsior (TUD Dresden University of Technology) ------------------------------------------------------------------------ [gesis-logo-new-50-50] Leibniz Institute for the Social Sciences Julia Romberg Computational Social Science, Team Data Science Methods +49(221)47694-742

1 0

Call for Papers : Recent Advances in Natural Language Processing (RANLP) 2025
by Ranasinghe, Tharindu 28 Dec '24

28 Dec '24

RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING Varna, Bulgaria http://ranlp.org/ranlp2025/ Summer School on Deep Learning and LLMs for NLP: 3-5 September 2025 (Wednesday-Friday) Tutorials: 6-7 September 2025 (Saturday-Sunday) Main Conference: 8-10 September 2025 (Monday-Wednesday) Workshops and shared tasks: 11-13 September 2025 (Thursday-Saturday) The biennial RANLP (Recent Advances in Natural Language Processing) conference is one of the most competitive and influential NLP conferences. The event grew out of the International Summer schools "Contemporary topics in Computational Linguistics" which were organised for many years as international training events. Previous RANLP conferences (1995, 1997, 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017, 2019, 2021 and 2023) featured keynote talks by leading experts in NLP as well as presentations/papers of high quality, rigorously reviewed by Programme Committee experts. Since 2009, the papers accepted at RANLP and the associated workshops are included in the ACL Anthology. The RANLP proceedings are indexed by SCOPUS and DBLP. The Proceedings has its own Scopus SJR, in 2023 it is 0,299. The conference will be preceded by a Summer School on Deep Learning and Large Language Models (LLMs) for Natural Language Processing (NLP) as well as tutorials on current topics of particular interest and cutting edge technologies. RANLP-2025 will be followed by specialised workshops as well as shared tasks covering timely NLP topics. A Student Research Workshop will be held in parallel with the main conference. The Student Research Workshops (now the 9th edition) have become active discussion fora for young researchers. TOPICS We invite papers reporting recent advances in all aspects of Natural Language Processing and particularly encourage submissions related to (and the employment of) the latest NLP methods including Large Language Models/Generative AI. Contributions from a broad range of areas will be welcome including, but not limited to, the following topics: phonetics, phonology, morphology; syntax, semantics, discourse, pragmatics, dialogue, lexicon; complexity; mathematical, statistical, machine learning and deep learning models; language resources and corpora; crowdsourcing for creation of linguistic resources; electronic dictionaries, terminologies and ontologies; sublanguages and controlled languages; linked data; POS tagging; parsing; semantic role labelling; word-sense disambiguation; multiword expressions and computational phraseology; textual entailment; anaphora resolution; temporal processing; language generation; speech recognition; text-to-speech synthesis; multilingual NLP; machine translation, translation memory systems and computer-aided translation tools, text simplification and readability estimation; knowledge acquisition; information retrieval; text categorisation; information extraction; text summarisation; terminology extraction; question answering; opinion mining and sentiment analysis; fact checking and fake news; stance recognition; hate speech and aggression detection; author profiling; dialogue systems; chatbots and conversational agents; irony and sarcasm detection; negation and speculation detection; computer-aided language learning; multimodal systems; language and vision; NLP for biomedical texts; NLP for educational applications; NLP for healthcare; NLP for financial purposes; NLP for legal texts; for the Semantic web; theoretical and application-orientated papers related to NLP. CHAIR OF THE PROGRAMME COMMITTEE Ruslan Mitkov (University of Lancaster) CHAIR OF THE ORGANISING COMMITTEE Galia Angelova (Bulgarian Academy of Sciences) The Programme Committee (PC) members are distinguished NLP experts from all over the world. The list of PC members will be announced at the conference website in due time. Keynote speakers, tutorial presenters, and summer school lecturers and tutors will be announced in the upcoming calls for papers. WORKSHOPS and SHARED TASKS: The RANLP 2025 workshops and shared tasks will be held on 11-13 September 2025. Calls for Proposals of Workshop and Shared Tasks have been already published. SUBMISSION OF PAPERS, POSTERS, DEMOS The submissions will be maintained by the conference management software START. For further instructions, please follow the submission information at the conference website at https://ranlp.org/ranlp2025/. The reviewing process will be anonymous. Double submission is acceptable, but authors will be asked to declare it at the time of submission. Submissions will be reviewed by at least three members of the Programme Committee. Authors of accepted papers will receive guidelines regarding how to produce camera-ready versions of their papers for inclusion in the proceedings. All RANLP papers have DOI numbers assigned. The full conference proceedings will be uploaded on the ACL Anthology. RANLP-2025 aims to provide early notification of acceptance to authors and presenters who need visa to enter Bulgaria. We invite early submissions of authors’ names and paper abstracts, in order to plan quick reviewing. Access to the conference management software will be available as from 1 April 2025. IMPORTANT DATES Call for Shared Tasks proposals: September 2024 Shared Tasks selection notification: 4 November 2024 Shared Tasks sample data and task website ready: 15 November 2024 Shared Tasks training data ready: 15 December 2024 Call for workshop proposals: 24 December 2024 Deadline for submission of workshop proposals: 15 March 2025 Workshop selection: 22 March 2025 Conference abstracts submission: April 2025 Conference papers submission: early/mid May 2025 (please check exact dates on RANLP 2025 website) Conference papers acceptance notification: 28 June 2025 Camera-ready versions of the conference papers: 31 July 2025 Workshop paper submission deadline (suggested): 30 June 2025 Workshop paper acceptance notification (suggested): 28 July 2025 Workshop paper camera-ready versions (suggested): 20 August 2025 Workshop camera-ready proceedings ready (suggested): 31 August 2025 RANLP Summer School on Deep Learning in NLP: 3-5 September 2025 RANLP tutorials: 6-7 September 2025 (Saturday-Sunday) RANLP conference: 8-10 September 2025 (Monday-Wednesday) RANLP workshops and Shared Tasks presentations: 11-13 September 2025 (Thursday-Saturday) VENUE RANLP 2025 will be held at the conference facilities of Hotel “Cherno More” (http://www.chernomorebg.com<http://www.chernomorebg.com/> ) in Varna, the largest city on the Bulgarian Black Sea Coast. The event venue is centrally located at the entrance of the Sea Garden and offers excellent conference facilities. The city is a major tourist destination with flights to/from the Varna International Airport. It is also known for its Archaeological Museum, which features the oldest gold treasure in the world (https://en.wikipedia.org/wiki/Varna_Necropolis). The conference organisers plan to organise an excursion to Provadia, the oldest salt-production and urban centre in Europe (5600 - 4350 BC, https://provadia-solnitsata.com/en/ ) which is located 50 km from Varna. THE TEAM BEHIND RANLP-25 Galia Angelova, Bulgarian Academy of Sciences, Bulgaria (Chair Organising Committee) Ruslan Mitkov, University of Lancaster, UK (Chair Programme Commitee) Nikolai Nikolov, Bulgarian Association for Computational Linguistics, Bulgaria Tharindu Ranasinghe, Lancaster University, UK (Workshops Chair and Shared tasks Co-Chair) Saad Ezzini, Lancaster University, UK (Sponsorship Chair and Shared tasks Co-Chair) Maria Kunilovskaya, Saarland University, Germany (Publication Chair) Preslav Nakov, MBZUAI, Abu Dhabi, UAE Ivelina Nikolova, Bulgarian Academy of Sciences, Bulgaria Kiril Simov, Bulgarian Academy of Sciences, Bulgaria (Workshops Co-Chair) Petya Osenova, Bulgarian Academy of Sciences, Bulgaria (Workshops Co-Chair)

1 0

Call for Workshop Proposals: RANLP-2025: 15th Conference on Recent Advances in Natural Language Processing
by Ranasinghe, Tharindu 28 Dec '24

28 Dec '24

Call for Workshop Proposals ================================================ RANLP-2025: 15th Conference on Recent Advances in Natural Language Processing Summer School DLinNLP 3-5 September 2025 (Wednesday-Friday) Tutorials 6-7 September 2025 (Saturday-Sunday) Main conference: 8-10 September 2025 (Monday-Wednesday) Workshops and Shared Tasks: 11-13 September 2025 (Thursday-Saturday) Varna, Bulgaria https://ranlp.org/ranlp2025/ ================================================ Following the workshops held in conjunction with the Conferences "Recent Advances in Natural Language Processing" RANLP-2005, RANLP-2007, RANLP-2009, RANLP-2011, RANLP-2013, RANLP-2015, RANLP-2017, RANLP-2019, RANLP-2021 and RANLP-2023, we are pleased to announce a call for workshop proposals for RANLP-2025. RANLP-2025 invites workshop proposals on any topic of interest to the Natural Language Processing (NLP) community, ranging from fundamental research issues to more applied industrial or commercial aspects. We encourage workshops related to (or discussing the employment of) the latest NLP methods including Large Language Models/Generative AI. Workshops can vary in length from a half day to full 1-2 days and can also feature demo sessions. The format of each workshop (face-to-face or hybrid) can be determined by its organisers the condition being that onsite sessions are held in Varna for the whole workshop duration so that other RANLP participants can take part in the event. Accepted workshops will receive one free registration to RANLP-2025 (full registration including the summer school, tutorials, all workshops, main conference, reception, conference dinner). VENUE The workshops will take place in Hotel "Cherno More", Varna, the main RANLP-2025 conference venue. If more than 5 workshops are selected, the RANLP-2025 organisers will provide conference halls in some of the neighbouring hotels or universities in downtown Varna. IMPORTANT DATES Workshop proposals due: 15 March 2025 Workshop selection: 22 March 2025 Workshop website due: 5 April 2025 Workshop paper submission deadline (suggested): 30 June 2025, immediately after RANLP notification Workshop paper acceptance notification (suggested): 28 July 2025 Workshop paper camera-ready versions (suggested): 20 August 2025 Workshop camera-ready proceedings ready (suggested): 31 August 2025 Workshops: 11-13 September 2025 REQUIREMENTS Proposals should be no longer than five pages and should contain the following: 1. Title and brief technical description of the workshop, specifying the goals and the technical issues that it will focus on; 2. Brief description of the target audience, including estimates of the numbers of submissions and attendees (a tentative list of potential contributors would be useful); 3. List of related workshops/events held in the last three years or to be held in 2025; 4. Tentative workshop program committee; 5. Names and contact information (web page, email address) of the proposed organising committee; 6. Description of the experience of the proposed organisers in the workshop topics and in organising workshops or related events. The workshop Organising Committee is responsible for the following: * Setting up and maintaining the workshop website; * Disseminating call for papers/participation; * Organising paper submission, review process, authors notification, and collecting audio/visual presentation requirements; * Verifying the camera-ready copies, providing electronic conference proceedings which are to be generated with the conference management system START; * In case of hybrid workshops, organising an onsite workshop component and chairing the live sessions in Varna. Workshop invited speakers: If the workshop organisers intend to host an invited talk, it is recommended that they invite somebody from the main conference keynote speakers or participants. If the workshop organisers decide to invite another speaker, it is very likely that the workshop organisers will have to secure financial support for this speaker. The RANLP-2025 Organising Committee is responsible for the following: * Providing a link to the workshop web page; * Publishing the workshop proceedings with ISBN numbers, and registering DOI numbers for all accepted papers; * Providing the workshop venue; * Organising registration, audio/visual support, coffee breaks, registration facilities, Internet access. WORKSHOP PROPOSAL SUBMISSION Workshop proposals in PDF format should be e-mailed to Tharindu Ranasinghe <t.ranasinghe[at]lancaster[dot]ac[dot]uk>, Kiril Simov <kivs[at]bultreebank[dot]org>, Petya Osenova <petya[at]bultreebank[dot]org> and cc'ed to <workshops2025(a)ranlp.org<mailto:workshops2025@ranlp.org>> EVALUATION Submitted proposals will be reviewed with respect to the following criteria: * Relevance, importance, and timeliness of the topics; * Completeness, clarity, and quality of the workshop proposal; * Experience of the organisers in the proposed topics; * Viability of the workshop. THE TEAM BEHIND RANLP-25 Galia Angelova, Bulgarian Academy of Sciences, Bulgaria (Chair Organising Committee) Ruslan Mitkov, University of Lancaster, UK (Chair Programme Commitee) Nikolai Nikolov, Bulgarian Association for Computational Linguistics, Bulgaria Tharindu Ranasinghe, Lancaster University, UK (Workshops Chair and Shared tasks Co-Chair) Saad Ezzini, Lancaster University, UK (Sponsorship Chair and Shared tasks Co-Chair) Maria Kunilovskaya, Saarland University, Germany (Publication Chair) Preslav Nakov, MBZUAI, Abu Dhabi, UAE Ivelina Nikolova, Bulgarian Academy of Sciences, Bulgaria Kiril Simov, Bulgarian Academy of Sciences, Bulgaria (Workshops Co-Chair) Petya Osenova, Bulgarian Academy of Sciences, Bulgaria (Workshops Co-Chair)

1 0

First CFP: 21st Workshop on Multiword Expressions (MWE 2025) @NAACL2025
by Atul K. Ojha 23 Dec '24

23 Dec '24

[Apologies for cross-postings] ******************************************************************************** First Call for Papers 21st Workshop on Multiword Expressions (MWE 2025) Organized, sponsored and endorsed by SIGLEX, the Special Interest Group on the Lexicon of the ACL Full-day workshop collocated with NAACL 2025, Albuquerque, New Mexico, U.S.A., May 3 or 4, 2025 Hybrid (on-site & on-line) Submission deadline: January 30, 2025 MWE 2025 website: <https://multiword.org/mwe2022/> https://multiword.org/mwe2025/ ******************************************************************************** Multiword expressions (MWEs), i.e., word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin and Kim, 2010), such as “by and large”, “hot dog”, “make a decision” and “break one's leg” are still a pain in the neck for Natural Language Processing (NLP). The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalized phrases, etc. Given their irregular nature, MWEs often pose complex problems in linguistic modeling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and Machine Translation), hence still representing an open issue for computational linguistics (Constant et al., 2017). For more than two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section <https://multiword.org/> of ACL-SIGLEX <http://www.siglex.org/> in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 21st edition of the workshop, we identified the following topics on which contributions are particularly encouraged: - MWE processing to enhance end-user applications. MWEs gained particular attention in end-user applications, including Machine Translation (MT) (Zaninello and Birch, 2020), simplification (Kochmar et al., 2020), language learning and assessment (Paquot et al., 2020), social media mining (Pelosi et al., 2017), and abusive language detection (Zampieri et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications. - MWE processing and identification in the general language, as well as in specialized languages and domains: Multiword terminology extraction from domain-specific corpora (Lossio-Ventura et al, 2014) is of particular importance to various applications, such as MT (Semmar and Laib, 2017), or for the identification and monitoring of neologisms and technical jargon (Chatzitheodorou and Kappatos, 2021). - MWE processing in low-resource languages: The PARSEME shared tasks (2017 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_05_MWE_2…>, 2018 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-M…>, 2020 <https://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_02_MWE-L…>) among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures, and tools that now allow fully integrating MWE identification into end-user applications. There are continuous efforts in this direction (Diaz Hernandez, 2024) and a few of them have also explored methods for the automatic interpretation of MWEs (Bhatia et al., 2018), and their processing in low-resource languages (Eder et al., 2021). Resource creation and sharing should be pursued in parallel with the development of multilingual benchmarks for MWE identification (Savary et al., 2023). - MWE identification and interpretation in LLMs: Most current MWE processing is limited to their identification and detection using pre-trained language models, but we still lack understanding about how MWEs are represented and dealt with therein (Garcia et al., 2021), how to better model the compositionality of MWEs from semantics (Phelps et al., 2024). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled (Shwartz and Dagan, 2019). - New and enhanced representation of MWEs in language resources and computational models of compositionality as gold standards for formative intrinsic evaluation. Through this workshop, we will bring together and encourage researchers in various NLP subfields to submit their MWE-related research, We also intend to consolidate the converging results of previous joint workshops LAW-MWE-CxG 2018 <http://multiword.sourceforge.net/lawmwecxg2018/>, MWE-WN 2019 <http://multiword.sourceforge.net/mwewn2019/> and MWE-LEX 2020 <http://multiword.sourceforge.net/mwelex2020/>, the joint MWE-WOAH panel in 2021 <https://multiword.org/mwe2021/#program>, the MWE-SIGUL 2022 joint session <https://multiword.org/mwe2022/>, and the MWE-UD 2024 <https://multiword.org/mweud2024/>, extending our scope to MWEs in e-lexicons, and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in: - Computationally-applicable theoretical work in psycholinguistics and corpus linguistics; - Annotation (expert, crowdsourcing, automatic) and representation in resources such as corpora, treebanks, e-lexicons, WordNets, constructions (also for low-resource languages); - Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.); - Discovery and identification methods, including for specialized languages and domains such as clinical or biomedical NLP; - Interpretation of MWEs and understanding of text containing them; - Language acquisition, language learning, and non-standard language (e.g. tweets, speech); - Evaluation of annotation and processing techniques; - Retrospective comparative analyses from the PARSEME shared tasks; - Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.); - Implicit and explicit representation in pre-trained language models and end-user applications; - Evaluation and probing of pre-trained language models; - Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications; - Multiword terminology extraction; - Adaptation and transfer of annotations and related resources to new languages and domains including low-resource ones. Submission formats: The workshop invites two types of submissions: - archival submissions that present substantially original research in both long paper format (8 pages + references) and short paper format (4 pages + references). - non-archival submissions of abstracts describing relevant research presented/published elsewhere which will not be included in the MWE proceedings. Paper submission and templates Papers should be submitted via the workshop's submission page <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/MWE> ( https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/MWE). Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL stylesheet <https://github.com/acl-org/acl-style-files>. Important Dates Paper Submission Deadline: January 30, 2025 Notification of acceptance: March 1, 2025 Camera-ready papers due: March 10, 2025 Workshop: May 3 or 4, 2025 All deadlines are at 23:59 UTC-12 (Anywhere on Earth). Organizing Committee Verginica Barbu Mititelu, Voula Giouli, Grazina Korvel, A. Seza Doğruöz, Alexandre Rademaker, Atul Kr. Ojha, Mathieu Constant Anti-harassment policy The workshop follows the ACL anti-harassment policy <https://www.aclweb.org/adminwiki/index.php?title=Anti-Harassment_Policy>. Contact For any inquiries regarding the workshop, please send an email to the Organizing Committee at mweworkshop2023(a)googlegroups.com.

1 0

Long-form Recordings events in Paris, June 2025
by Alex CRISTIA 20 Dec '24

20 Dec '24

Dear colleagues, (apologies for cross-posting) Long-form (also called daylong) recordings (LFR) are increasingly used in a range of fields, including to document language input and outcomes in under-described populations (e.g., Casillas et al., 2020); and to assess potential effects of early childhood interventions (e.g., Weber et al., 2017). We are happy to announce two exciting events related to long-form recordings (LFR) that will take place in person at PSL University/Ecole Normale Supérieure in Paris. The LFR Interdisciplinary Summit ( lfris2025.sciencesconf.org) on June 19-20, 2025, exploring cutting-edge innovations in long-form recordings with talks by leading researchers. You can find more information about this event here <https://lfris2025.sciencesconf.org/?forward-action=index&forward-controller…>. Registration for that event will open in March and close in May. Today, we want to especially draw your attention to the LFRAZ Summer School (Long-form Recordings from A to Z; lfraz2025.sciencesconf.org), which will take place June 16-19, 2025. This hands-on summer school aims to provide attendees who are newbies to the method with all the tools they need to collect and analyze LFRs. The mornings will feature lectures and roundtables with leading experts, while afternoons will provide opportunities for individual and group projects, as well as office hours for tailored support. Here's what attendees can hope for: - Comprehensive Training: From data collection to modeling you’ll gain practical skills to integrate long-form recordings into your research. - Networking Opportunities: The event brings together researchers from diverse fields, including linguistics, anthropology, economics, and developmental science. - Automatic Speech Annotations: Learn to use open-source tools and hardware for analyzing speech data in culturally diverse contexts. We are offering a limited number of travel and accommodation grants for individuals working outside North America and Europe. To learn more about the school, visit https://lfraz2025.sciencesconf.org/. To apply, fill out the form available here, which takes roughly <https://docs.google.com/forms/d/e/1FAIpQLSdbnxhRibXKazWQSnkEzjo0ICI9G_4whBB…>15 minutes to complete. We recommend preparing one's answers in advance. To see the full list of questions, see here <https://drive.google.com/file/d/17km0_R7O4-49icR7hanxGiM5q0nkIoC5/view?usp=…>. The application deadline is the 15th of January. If you can't make it to Paris in person, we recommend that you still apply, since we believe similar schools (Global LFRAZ) will be organized in person and/or online, so we can keep you posted on those. Also, if you are interested in being part of the Global LFRAZ <https://lfraz2025.sciencesconf.org/page/global_lfraz?lang=en>, more information on that is found here <https://lfraz2025.sciencesconf.org/page/global_lfraz?lang=en>. Please share this information with interested parties! --------------------------------------------------------------- Alex (Alejandrina) Cristia Researcher, CNRS Laboratoire de Sciences Cognitives et Psycholinguistique 29, rue d'Ulm, 75005, Paris, FRANCE My site: www.acristia.org --------------------------------------------------------------- If you donate, ask me about effective charities <https://effectivealtruism.us8.list-manage.com/track/click?u=52b028e7f799cca…>. / Si vous faites des dons, posez-moi des questions sur le don efficace <https://www.altruismeefficacefrance.org/donner-efficacement>.

1 0

Call for participation in Web Survey on Data Bottlenecks in Supervised NLP
by Romberg, Julia 19 Dec '24

19 Dec '24

Dear list members, We invite you to participate in our web survey exploring how recent advancements in NLP, such as LLMs, have changed the need for labeled data in Supervised Machine Learning. Survey details: * Topic: Web survey on Data Annotation and Active Learning * Target group: Researchers and practitioners alike in the fields of NLP, Supervised Machine Learning, and Active Learning in particular (not required). * Duration: ~15 minutes * Deadline for participation: January 12, 2025 * Survey link: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271 Why should I invest my time in this survey? * Make an impact: Participate in a community-effort and help to gain a better understanding of the current state and open issues on methods that are used to overcome a lack of labeled data. * Gain insights: Receive a report with key findings to incorporate these insights into research and development of new methods and technologies. Thank you for considering participating in our survey! If you have any questions or require additional information, please don't hesitate to contact us directly at activelearningsurvey2024(a)gmail.com<mailto:activeLearningSurvey2024@gmail.com>. If you know colleagues or peers who might be interested, we'd be grateful if you could forward this survey to them as well. Best regards, Julia Romberg (GESIS - Leibniz Institute for the Social Sciences, Germany) Christopher Schröder (Institut für Angewandte Informatik e. V., Germany) Julius Gonsior (TUD Dresden University of Technology) ------------------------------------------------------------------------ [gesis-logo-new-50-50] Leibniz Institute for the Social Sciences Julia Romberg Computational Social Science, Team Data Science Methods +49(221)47694-742

1 0

Call for Participation: The First Workshop on Language Models for Low-Resource Languages (LoResLM 2025@COLING)
by Ranasinghe, Tharindu 18 Dec '24

18 Dec '24

Neural language models have revolutionised natural language processing (NLP) and have provided state-of-the-art results for many tasks. However, their effectiveness is largely dependent on the pre-training resources. Therefore, language models (LMs) often struggle with low-resource languages in both training and evaluation. Recently, there has been a growing trend in developing and adopting LMs for low-resource languages. LoResLM aims to provide a forum for researchers to share and discuss their ongoing work on LMs for low-resource languages. LoResLM 2025 will be a physical workshop co-located with COLING 2025, Abu Dhabi on 20th January 2025. We are pleased to share the programme of LoResLM 2025 with you. Please visit https://loreslm.github.io/program for the full programme. To register for the workshop, please visit https://coling2025.org/registration/ We are looking forward to welcoming you at LoResLM 2025 in Abu Dhabi. The workshop is supported in part by CLARIN-UK, funded by the Arts and Humanities Research Council as part of the Infrastructure for Digital Arts and Humanities programme. >> Keynote Speaker Jose Camacho-Collados, Cardiff University. >> Organising Committee Hansi Hettiarachchi, Lancaster University, UK Tharindu Ranasinghe, Lancaster University, UK Paul Rayson, Lancaster University, UK Ruslan Mitkov, Lancaster University, UK Mohamed Gaber, Birmingham City University, UK Damith Premasiri, Lancaster University, UK Fiona Anting Tan, National University of Singapore, Singapore Lasitha Uyangodage, University of Münster, Germany >> Programme Committee Gábor Bella - IMT Atlantique, France Samuel Cahyawijaya - The Hong Kong University of Science and Technology, Hong Kong Burcu Can - University of Stirling, UK Çağrı Çöltekin - University of Tübingen, Germany Raj Dabre - National Institute of Information and Communications Technology, Japan Vera Danilova - Uppsala University, Sweden Debashish Das - Birmingham City University, UK Ona de Gibert - University of Helsinki, Finland Alphaeus Dmonte - George Mason University, USA Bonaventure F. P. Dossou - McGill University, Canada Daan van Esch - Google Ignatius Ezeani - Lancaster University, UK Anna Furtado - University of Galway, Ireland Amal Htait - Aston University, UK Ali Hürriyetoğlu - Wageningen University & Research, Netherlands Danka Jokic - University of Belgrade, Serbia Diptesh Kanojia - University of Surrey, UK Daisy Lal - Lancaster University, UK Colin Leong - University of Dayton, USA Veronika Lipp - Hungarian Research Centre for Linguistics, Hungary Muhidin Mohamed - Aston University, UK Farhad Nooralahzadeh - University of Zurich, Switzerland Rrubaa Panchendrarajan - Queen Mary University of London, UK Nadeesha Pathirana - Aston University, UK Alistair Plum - University of Luxembourg, Luxembourg Nishat Raihan - George Mason University, USA Omid Rohanian - University of Oxford, UK Sandaru Seneviratne - Australian National University, Australia Ravi Shekhar - University of Essex, UK Archchana Sindhujan - University of Surrey, UK Claytone Sikasote - University of Cape Town, South Africa Marjana Prifti Skenduli - University of New York Tirana, Albania Uthayasanker Thayasivam - University of Moratuwa, Sri Lanka Taro Watanabe - Nara Institute of Science and Technology, Japan John Vidler - Lancaster University, UK Phil Weber - Aston University, UK Bryan Wilie - Hong Kong University of Science & Technology, Hong Kong Artūrs Znotiņš - University of Latvia, Latvia URL - https://loreslm.github.io/ Twitter - https://x.com/LoResLM2025 Dr Tharindu Ranasinghe School of Computing and Communications | Lancaster University Contact me on Teams<https://teams.microsoft.com/l/chat/0/0?users=t.ranasinghe@lancaster.ac.uk> www.lancaster.ac.uk<https://www.lancaster.ac.uk/>

1 0

Fwd: ELAR Online Training Series in Language Documentation and Archiving
by Maite Melero 16 Dec '24

16 Dec '24

FYI ================================= Dear colleagues, ELAR is excited to share the news that the *Endangered Languages Documentation Programme* is offering an online training series in Language Documentation and Archiving from March 6 to June 12, 2025. Applications to participate in the training series are due 30 January 2025. Please see the call below for more information. Please help this call reach a broader audience for this series by sharing it with your students, colleagues, and others who may be interested in the training. Best wishes, The ELAR Team --------------------------------------------------------------------------------------------------------- Online Training Series in Language Documentation and Archiving 6 March – 12 June 2025 The Endangered Languages Documentation Programme (ELDP) is offering a series of online trainings in Language Documentation and Archiving from *March 6 to June 12, 2025*. Training participants will meet weekly on Thursdays, live via Zoom, for a webinar and discussion session. They will be expected to complete readings, hands-on practice, and online assessments between sessions. Live attendance at all sessions and the completion of all assignments is required. Below are the topics that will be covered in the training series: · Linguistic diversity and language endangerment · Language Documentation theory & methods · Understanding archival collections · Compiling a documentary collection · Audio and video recording methods · Transcription, translation, and annotation with ELAN · Lexicography and dictionary creation with Fieldworks Language Explorer (FLEx) · Metadata creation and managing data · Project planning and design · Grant writing for language documentation projects The online sessions will take place from 9:00 to 11:00 CET. Readings, hands-on practice, and homework assignments will be made available via a free course website. The language of instruction is English. The training series has 25 spots available. Applicants planning to work with endangered and under-documented languages (see Hammarström 2019 <https://elararchive.org/blog/2019/12/17/which-language-should-i-document-so…>), especially Papuan languages, are strongly encouraged to apply. Applicants should meet the criteria listed below: · Have plans to document an endangered and under-documented language · Be able to attend all webinar sessions and complete readings and assignments · Have a sufficient level of spoken and written English to be able to complete assignments · Have regular access to a Windows computer and a reliable internet connection To apply and for more information, please go here <https://www.eldp.net/en/our+trainings/online+training+series/>. The deadline is January 30th, 2025. --------------------------------------------------------------------------------------------------------- -- *Interested in keeping up with ELAR? Subscribe to our new **mailing list* <https://www.listserv.dfn.de/sympa/subscribe/elar-news>*!* *Endangered Languages Archive* Berlin-Brandenburg Academy of Sciences and Humanities Jägerstraße 22/23 10117 Berlin, Germany Website: https://elararchive.org/ Facebook: https://www.facebook.com/elararchive/ Instagram: https://www.instagram.com/elararchive/ Twitter: @ELARarchive <https://www.twitter.com/elararchive/> Blog: https://elararchive.org/blog Vimeo: https://vimeo.com/user64477333/albums

1 0

CfP deadline extension: Nordic-Baltic Responsible Evaluation and Alignment of Language Models (NB-REAL 2025)
by Annika Simonsen - HI 13 Dec '24

13 Dec '24

***Apologies for possible cross-posting *** CALL FOR PAPERS DEADLINE EXTENSION We are pleased to announce that the submission deadline for the 1st Workshop on Nordic-Baltic Responsible Evaluation and Alignment of Language Models (NB-REAL) has been extended from December 16th to December 23rd, 2024. The workshop will be held on March 2, 2025, as part of the NoDaLiDa/Baltic-HLT 2025 conference in Tallinn, Estonia. About the Workshop This half-day workshop focuses on the responsible evaluation and alignment of Large Language Models (LLMs) for Nordic and Baltic languages. Our goal is to bring together researchers, practitioners, and stakeholders to address the unique challenges and opportunities in this rapidly evolving field. Topics of Interest We welcome submissions on topics including, but not limited to: - Ethical benchmarks for evaluating LLMs in Nordic and Baltic languages - Methods for creating culturally sensitive and inclusive evaluation datasets - Responsible techniques for generating or collecting alignment data - Challenges and solutions in ethical LLM alignment for less-resourced languages - Case studies on responsible LLM evaluation or alignment projects - Ethical considerations in LLM evaluation and alignment - Comparative studies of LLM performance and fairness in Nordic and Baltic languages - Innovative approaches to leveraging limited language resources in evaluation or alignment of language models Important Dates Paper Submission Deadline: December 16, 2024 Notification of Acceptance: January 13, 2025 Camera-Ready Deadline: February 3, 2025 Workshop Date: March 2, 2025 Workshop Format NB-REAL 2025 will be a half-day workshop held on March 2, 2025 (pre-conference). It will be a hybrid event with both on-site and online participation available. Submission Submissions can be long papers (8 pages) or short papers (4 pages). All submissions must follow the NoDaLida template, available in both LaTeX and MS Word. The templates are available at the official conference website, see https://www.nodalida-bhlt2025.eu/call-for-papers#h.v2k63awq0fpe. All submissions will undergo peer review by the program committee. To submit your paper please visit NB-REAL 2025 Workshop | OpenReview<https://openreview.net/group?id=NoDaLiDa/Baltic-HLT/2025/Workshop/NB-REAL#t…> Organizers Hafsteinn Einarsson, Associate Professor in Computer Science, University of Iceland (hafsteinne(a)hi.is) Annika Simonsen, PhD Student, University of Iceland (annika(a)hi.is) Dan Saattrup Nielsen, Senior AI Specialist, Alexandra Institute (dan.nielsen(a)alexandra.dk) For more information, please visit our website: https://nbreal.xyz/ We look forward to your contributions and to seeing you at NB-REAL 2025!

1 0