- Corpora - ELRA lists

Final CfP: 9th SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
by Anna Kazantseva 15 Jan '25

15 Jan '25

LaTeCH-CLfL 2025: The 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature to be held on May 3rd or 4th, 2025 in conjunction with NAACL 2025 <https://2025.naacl.org/> in Albuquerque, NM. https://sighum.wordpress.com/latech-clfl-2025/ Second Call for Papers (with apologies for cross-posting) Organisers: Diego Alves, Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Janis Pagel, Stan Szpakowicz LaTeCH-CLfL 2025 is the ninth in a series of meetings for NLP researchers who work with data from the broadly understood arts, humanities and social sciences, and for specialists in those disciplines who apply NLP techniques in their work. The workshop continues a long tradition of annual meetings. The SIGHUM Workshops on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) ran ten times in 2007-2016. The five Workshops on Computational Linguistics for Literature (CLfL) took place in 2012-2016. The first eight joint workshops (LaTeCH-CLfL) were held in 2017-2024. Topics and content In the Humanities, Social Sciences, Cultural Heritage and literary communities, there is increasing interest in, and demand for, NLP methods for semantic and structural annotation, intelligent linking, discovery, querying, cleaning and visualization of both primary and secondary data. This is even true of primarily non-textual collections, given that text is also the pervasive medium for metadata. Such applications pose new challenges for NLP research: noisy, non-standard textual or multi-modal input, historical languages, vague research concepts, multilingual parts within one document, and so no. Digital resources often have insufficient coverage; resource-intensive methods require (semi-)automatic processing tools and domain adaptation, or intense manual effort (e.g., annotation). Literary texts bring their own problems, because navigating this form of creative expression requires more than the typical information-seeking tools. Examples of advanced tasks include the study of literature of a certain period, author or sub-genre, recognition of certain literary devices, or quantitative analysis of poetry. NLP methods applied in this context not only need to achieve high performance, but are often applied as a first step in research or scholarly workflow. That is why it is crucial to interpret model results properly; model interpretability might be more important than raw performance scores, depending on the context. More generally, there is a growing interest in computational models whose results can be used or interpreted in meaningful ways. It is, therefore, of mutual benefit that NLP experts, data specialists and Digital Humanities researchers who work in and across their domains get involved in the Computational Linguistics community and present their fundamental or applied research results. It has already been demonstrated how cross-disciplinary exchange not only supports work in the Humanities, Social Sciences, and Cultural Heritage communities but also promotes work in the Computational Linguistics community to build richer and more effective tools and models. Topics of interest include, but are not limited to, the following: • adaptation of NLP tools to Cultural Heritage, Social Sciences, Humanities and literature; • automatic error detection and cleaning of textual data; • complex annotation schemas, tools and interfaces; • creation (fully- or semi-automatic) of semantic resources; • creation and analysis of social networks of literary characters; • discourse and narrative analysis/modelling, notably in literature; • emotion analysis for the humanities and for literature; • generation of literary narrative, dialogue or poetry; • identification and analysis of literary genres; • interpretability of large language models output for DH-related tasks (explainable AI); • linking and retrieving information from different sources, media, and domains; • low-resource and historical language processing; • modelling dialogue literary style for generation; • modelling of information and knowledge in the Humanities, Social Sciences, and Cultural Heritage; • profiling and authorship attribution; • search for scientific and/or scholarly literature; • work with linguistic variation and non-standard or historical use of language. Information for authors We invite papers on original, unpublished work in the topic areas of the workshop. In addition to long papers, we will consider short papers and system descriptions (demos). We also welcome position papers. • Long papers, presenting completed work, may consist of up to eight (8) pages of content plus additional pages of references (just two if possible -:). The final camera-ready versions of accepted long papers will be given one additional page of content (up to 9 pages) so that reviewers’ comments can be taken into account. • A short paper / demo presenting work in progress, or the description of a system, and may consist of up to four (4) pages of content plus additional pages of references (one if you can). Upon acceptance, short papers will be given five (5) content pages in the proceedings. • A position paper — clearly marked as such — should not exceed eight (8) pages including references. All submissions are to follow the *ACL paper styles (for LaTeX / Overleaf and MS Word) available at https://github.com/acl-org/acl-style-files. Papers should be submitted electronically, only in PDF, via the LaTeCH-CLfL 2025 submission website on the SoftConf pages (we will publish the link as soon as we have it). Reviewing will be double-blind. Please do not include the authors’ names and affiliations, or any references to Web sites, project names, acknowledgements and so on — anything that immediately reveals the authors’ identity. Self-references should be kept to a reasonable minimum, and anonymous citations cannot be used. Submission link: https://softconf.com/naacl2025/LaTeCH-CLfL2025/ Important dates (tentative) Workshop paper due: January 30, 2025 Notification of acceptance: March 1, 2025 Camera-ready papers due: March 10, 2025 Workshop date: May 3rd or 4th, 2025 More on the organizers Diego Alves, Language Science and Technology, Saarland University Yuri Bizzoni, Center for Humanities Computing / School for Communication and Culture, Århus University Stefania Degaetano-Ortlieb, Language Science and Technology, Saarland University Anna Kazantseva, National Research Council Canada Janis Pagel, Department of Digital Humanities, University of Cologne Stan Szpakowicz, School of Electrical Engineering and Computer Science, University of Ottawa Contact latech-clfl(a)googlegroups.com <mailto:latech-clfl@googlegroups.com>

1 0

Call for Participation: Slav-NLP Shared Task: Analysis of Persuasion Techniques — in Parliamentary Debates and Social Media, for Slavic Languages
by Roman Yangarber 15 Jan '25

15 Jan '25

*Call for Participation* ** Shared Task: Detection and Classification of Persuasion Techniquesin Parliamentary Debates and Social Media, for Slavic Languages * Co-located with Slav-NLP 2025 <http://bsnlp.cs.helsinki.fi/>Workshop, at ACL 2025 http://bsnlp.cs.helsinki.fi/shared-task.html <http://bsnlp.cs.helsinki.fi/shared-task.html> * * TASK DESCRIPTION: * * The task focuses on detection and classification of Persuasion Techniques in 5 Slavic languages — Bulgarian, Polish, Croatian, Slovene and Russian — in two types of texts: (a) parliamentary debates on hotly-contested topics, and (b) social media posts, related to the spread of disinformation. The task has two subtasks: 1. Subtask 1: Detection — Given a text and a list of fragment offsets, determine for each fragment whether it contains one or more persuasion techniques, from a given taxonomy of persuasion techniques, 2. Subtask 2: Classification —Given a text and a list of fragment offsets, determine for each fragment which persuasion techniques are employed therein. We use a rich taxonomy with 25 persuasion techniques: Name-calling or labelling, Guilt by association, Casting doubt, Appeal to hypocrisy, Questioning the reputation, Flag waiving, Appeal to authority, Appeal to popularity, Appeal to fear and prejudice, Appeal to values, Strawman, Whataboutism, Red herring, Appeal to pity, Causal oversimplification, False dilemma or no choice, Consequential oversimplification, False equivalence, Slogans, Conversation killer, Appeal to time, Loaded language, Obfuscation-Intentional vagueness-confusion, Exaggeration or minimization, Repetition. Subtask 1 is a binary classification task, whereas Subtask 2 is a multi-class multi-label classification task. The text fragments correspond to paragraphs. For information about training and test data, guidelines, and participation, please see theShared Task Home Page. <http://bsnlp.cs.helsinki.fi/shared-task.html> IMPORTANT: Participants may join both subtasks or only one. It is not mandatory to submit responses for all languages. Up to max. 5 system responses per language are allowed. Important Dates * Registration deadline: 20 April 2025 * Release of Testdata to registered participants: *22 April*2025 * Submission of system responses: 26 April 2023 * Results announced to participants: *29*April 2025 * Submission of shared task papers (optional): 11 May 2025 * ** *Questions and contact: bsnlp(a)cs.helsinki.fi<mailto:bsnlp@cs.helsinki.fi>* ** -- Roman Yangarber Professor, University of Helsinki, Finland Digital Humanities INEQ: Helsinki Inequality Initiative <https://helsinki.fi/en/ineq-helsinki-inequality-initiative> — Linguistic Inequalities and Translation Technologies ------------------------------------------------------------------------ e-Learning & language learning Language Learning Lab Unioninkatu 40, Metsätalo A214 revitaAI.github.io <https://revitaai.github.io> helsinki.fi/language-learning-lab <https://www.helsinki.fi/language-learning-lab> mobile: +358 50 41 51 71 3 ------------------------------------------------------------------------ RЯ

1 0

January 2025 Newsletter - LDC
by Penn LDC 15 Jan '25

15 Jan '25

In this newsletter: Renew your LDC membership today New publications: Iraqi Arabic - English Lexical Database<https://catalog.ldc.upenn.edu/LDC2025L01> LORELEI Hungarian Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2025T01> ________________________________ Renew your LDC membership today The importance of curated resources for language-related education, research, and technology development drives LDC's mission to create them, to accept data contributions from researchers across the globe, and to broadly share such resources through the LDC Catalog. LDC members enjoy no-cost access to new corpora released annually, as well as the ability to license legacy data sets from among our 960+ holdings at reduced fees. Ensure that your data needs continue to be met by renewing your LDC membership or by joining the Consortium today. Now through March 3, 2025, 2024 members receive a 10% discount on 2025 membership, and new or returning organizations receive a 5% discount. Membership remains the most economical way to access current and past LDC releases. Consult Join LDC<https://www.ldc.upenn.edu/members/join-ldc> for more details on membership options and benefits. ________________________________ New publications: Iraqi Arabic - English Lexical Database<https://catalog.ldc.upenn.edu/LDC2025L01> was developed by LDC. It has six interrelated tables presenting over 67,000 Iraqi Arabic words as orthographic forms in Arabic script and pronunciation forms in IPA format, along with more than 120,000 English tokens. This release is the result of a collaboration with Georgetown University Press <https://press.georgetown.edu/> to enhance and update three dialectal Arabic dictionaries -- Iraqi, Moroccan, and Syrian -- originally published in the 1960s. The Georgetown Dictionary of Iraqi Arabic<https://press.georgetown.edu/Book/The-Georgetown-Dictionary-of-Iraqi-Arabic> was published in 2013. That work was based on, and expanded, two dictionaries, A Dictionary of Iraqi Arabic: English-Arabic (Clarity, Stowasser, and Wolfe, eds., 2003) and A Dictionary of Iraqi Arabic: Arabic-English (Woodhead and Beene, eds., 2003). The several enhancements developed by LDC in the updated and enhanced dictionary and the lexical database included facilitating comparisons across Arabic dialects and Modern Standard Arabic by providing Arabic script spellings and IPA pronunciations to Iraqi words and phrases; promoting ease of use by language learners and researchers by developing reasonable orthographic conventions for applying the Arabic alphabet to the dialect; and facilitating a user's understanding of morphological and lexical relations by adding information on the linguistic structures of Iraqi Arabic. The documentation accompanying this release includes instructions for combining into one database the tables in this corpus with the tables in Moroccan Arabic - English Lexical Database LDC2023L01.<https://catalog.ldc.upenn.edu/LDC2023L01> 2025 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee. * LORELEI Hungarian Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2025T01> is comprised of over 686 million words of Hungarian monolingual text, 165,000 words of which were translated into English, 2.3 million words of found Hungarian-English parallel text, and 87,000 Hungarian words translated from English data. Approximately 72,500 words were annotated for named entities and over 25,000 words were annotated for full entity (including nominals and pronouns), entity linking and situation frames (identifying entities, needs and issues); over 17,000 words have simple semantic annotation; and close to 10,000 words were annotated for noun phrase chunking. Data was collected from discussion forum, news, reference, social network, and weblogs. The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage. The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

EUROCALL 2025: CfP
by Passarotti Marco Carlo (marco.passarotti) 15 Jan '25

15 Jan '25

The call for papers for EUROCALL 2025 is out. See: https://eurocall2025.com/call-for-papers/ EUROCALL is the European Association for Computer Assisted Language Learning. The conference will be held in Milan at Università Cattolica on 27-30 August 2025. IMPORTANT DATES 01 December 2024: first call for papers mid December 2024: submission opens 03 February 2025: submission of abstracts closes 21 February 2025: deadline to sign up as reviewer of abstracts on OpenConf w/c 24th February/ 3rd March 2025: reviews assigned 31 March 2025: deadline for completion of all reviews 14 April 2025: notification to authors 15 April - 15 June 2025: early bird registration 16 June 2025 - 16 July 2025: ordinary conference registration 27-30 August 2025: EUROCALL 2025 Best, Marco Prof. Marco C. Passarotti Computational Linguistics Index Thomisticus Treebank https://itreebank.marginalia.it/ ERC Grantee, P.I. LiLa https://lila-erc.eu/ (Grant Agreement No. 769994) CIRCSE Research Centre https://centridiricerca.unicatt.it/circse_index.html [cropped-europe-flag.png] [cropped-erc_high_res.png] [cropped-lila-logo-9.png] Università Cattolica del Sacro Cuore Largo Gemelli, 1 20123 Milan, Italy marco.passarotti(a)unicatt.it tel. +39-02-72342380 [http://static.unicatt.it/ext-portale/5xmille_firma_mail_2023.jpg] <https://www.unicatt.it/uc/5xmille>

1 0

Postdoc in TrustLLM project at Linköping University
by Marcel Bollmann 15 Jan '25

15 Jan '25

The NLP group at Linköping University<https://liu-nlp.ai/>, Sweden, is looking for a Postdoc in Natural Language Processing within the EU-funded TrustLLM project on developing open, trustworthy, and factual large language models. The position is full-time (100%) for a fixed term of two years, with the potential of an extension to a total of three years, and comes without teaching obligation. Starting date is by agreement, but ideally as soon as possible. Research areas include language adaptation and modularisation of LLMs, tokenization for multilingual LLMs, as well as evaluation of relevant qualities (e.g. trustworthiness, factuality) in multilingual LLMs. For more information about this position and how to apply, see: https://liu-nlp.ai/postdoc-trustllm-2025/ The application deadline is 2025-02-05. Please do not hesitate to contact me for details and discussion! Best regards, Marcel -- Marcel Bollmann, Dr. phil. Associate Professor in Natural Language Processing Department of Computer and Information Science, Linköping University, Sweden www: https://marcel.bollmann.me/

1 0

[CFP] ACL 2025 Call for Papers (second)
by ACL Announcements 15 Jan '25

15 Jan '25

ACL 2025 Call for Papers Main Conference ACL 2025 Website: https://2025.aclweb.org/ Submission Deadline: February 15, 2025 Conference Dates: July 27 to August 1, 2025 Location: Vienna, Austria Special Theme: “Generalization of NLP Models” Contact: Roberto Navigli (General Chair) Wanxiang Che, Joyce Nabende, Mohammad Taher Pilehvar, Ekaterina Shutova (Program Chairs) Overview ACL 2025 invites the submission of long and short papers featuring substantial, original, and unpublished research in all aspects of Computational Linguistics and Natural Language Processing. ACL 2025 has a goal of a diverse technical program—in addition to traditional research results, papers may contribute negative findings, survey an area, announce the creation of a new resource, argue a position, report novel linguistic insights derived using existing computational techniques, and reproduce, or fail to reproduce, previous results. As in recent years, some of the presentations at the conference will be of papers accepted by the Transactions of the ACL (TACL) and by the Computational Linguistics (CL) journals. Papers submitted to ACL 2025, but not selected for the main conference, will also automatically be considered for publication in the Findings of the Association of Computational Linguistics. Paper Submission Information Papers may be submitted to the ARR 2025 February cycle. Papers that have received reviews and a meta-review from ARR (whether from the ARR 2025 February cycle or an earlier ARR cycle) may be committed to ACL 2025 via the conference commitment site (TBA). Submission Topics ACL 2025 aims to have a broad technical program. Relevant topics for the conference include, but are not limited to, the following areas (in alphabetical order): Computational Social Science and Cultural Analytics Dialogue and Interactive Systems Discourse and Pragmatics Efficient/Low-Resource Methods for NLP Ethics, Bias, and Fairness Generation Human-centered NLP Information Extraction Information Retrieval and Text Mining Interpretability and Analysis of Models for NLP Language Modeling Linguistic theories, Cognitive Modeling and Psycholinguistics Machine Learning for NLP Machine Translation Multilinguality and Language Diversity Multimodality and Language Grounding to Vision, Robotics and Beyond NLP Applications Phonology, Morphology and Word Segmentation Question Answering Resources and Evaluation Semantics: Lexical and Sentence-Level Sentiment Analysis, Stylistic Analysis, and Argument Mining Speech recognition, text-to-speech and spoken language understanding Summarization Syntax: Tagging, Chunking and Parsing Special Theme: Generalization of NLP Models ACL 2025 Theme Track: Generalization of NLP Models Following the success of the ACL 2020-2024 Theme tracks, we are happy to announce that ACL 2025 will have a new theme with the goal of reflecting and stimulating discussion about the current state of development of the field of NLP. Generalization is crucial for ensuring that models behave robustly, reliably, and fairly when making predictions on data different from their training data. Achieving good generalization is critically important for models used in real-world applications, as they should emulate human-like behavior. Humans are known for their ability to generalize well, and models should aspire to this standard. The theme track invites empirical and theoretical research and position and survey papers reflecting on the Generalization of NLP Models. The possible topics of discussion include (but are not limited to) the following: How can we enhance the generalization of NLP models across various dimensions—compositional, structural, cross-task, cross-lingual, cross-domain, and robustness? What factors affect the generalization of NLP models? What are the most effective methods for evaluating the generalization capabilities of NLP models? While Large Language Models (LLMs) significantly enhance the generalization of NLP models, what are the key limitations of LLMs in this regard? The theme track submissions can be either long or short. We anticipate having a special session for this theme at the conference and a Thematic Paper Award in addition to other categories of awards. Two-Stage Review: Submission to ARR, Commitment to ACL 2025 ACL 2025 will use ACL Rolling Review (ARR) as a reviewing system, but final decisions will be made by the conference. Both submissions of articles for review and commitment of reviewed articles to the conference will be performed via the Open Review platform. Specifically, authors will follow a two-step process: Authors submit articles to ARR, where submissions receive reviews and meta-reviews from ARR reviewers and area chairs; Authors commit their reviewed articles to a publication venue (e.g., ACL 2025), where Senior Area Chairs and Program Chairs make acceptance decisions from the ARR reviews and meta-reviews. ACL 2025 has chosen this approach in coordination with *CL 2024 conferences, which are adopting the same procedure and a coordinated submission plan to allow maximum flexibility during their submission periods for the authors. At each cycle, after a paper has been fully reviewed, authors have the option to commit their paper to a conference or revise and resubmit for another round of reviews. The reviewing process will continue to be double-blind. Reviewers will not see authors, nor will authors see reviewers, and reviews on ARR will not be made publicly visible. However, authors will be given the option through ARR to make their anonymized submitted articles publicly visible. Mandatory Reviewing Workload As the pace of research in the field continues to increase, we need to strengthen the commitment to reviewing for each paper submission. During the ARR submission process, authors will be required to specify which co-authors are committing to cover reviewing in this reviewing cycle. Please see the new ARR policy regarding reviewing workload here. As this is an ARR-wide policy for all *CL conferences, questions or clarifications should be addressed to ARR directly. Important Dates: Submission deadline (all papers are submitted to ARR): February 15, 2025 ARR reviews & meta-reviews available to authors of the February cycle: April 15, 2025 Commitment deadline for ACL 2025: April 20, 2025 Notification of acceptance: May 15, 2025 Withdrawal deadline: May 30, 2025 Camera-ready papers due: May 30, 2025 Tutorials: July 27, 2025 Conference: July 28 - 30, 2025 Workshops: July 31 - August 1, 2025 Note: All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”). Paper Submission Details Both long and short paper submissions should follow all of the ARR submission requirements at https://aclrollingreview.org/cfp, including: Long Papers (8 pages) and Short Papers (4 pages): Instructions for Two-Way Anonymized Review: Authorship Citation and Comparison Multiple Submission Policy, Resubmission Policy, and Withdrawal Policy Ethics Policy including the responsible NLP research checklist Limitations Paper Submission and Templates Optional Supplementary Materials Final versions of accepted papers will be given one additional page of content (up to 9 pages for long papers, up to 5 pages for short papers) to address reviewers’ comments. Following the ACL and ARR policies, there is no anonymity period requirement. At the time of submission to ARR, authors will be asked to select a preferred venue (e.g., ACL 2025). This is used only to calculate acceptance rates. Authors who selected ACL 2025 as a preferred venue when submitting to ARR may choose not to commit to ACL 2025 after receiving their reviews, and authors who selected a preferred venue other than ACL 2025 when submitting to ARR are still welcome to commit to ACL 2025. Presentation at the Conference All accepted papers must be presented at the conference to appear in the proceedings. The conference will include both in-person and virtual presentation options. Papers without at least one presenting author registered by the early registration deadline may be subject to desk rejection. Long and short papers will be presented orally or as posters as determined by the program committee. While short papers will be distinguished from long papers in the proceedings, there will be no distinction in the proceedings between papers presented orally and papers presented as posters.

1 0

Call for participation: MultiLexNorm 2: Multilingual Lexical Normalization
by Rob van der Goot 14 Jan '25

14 Jan '25

Dear all, Today, the data freeze of the MultiLexNorm 2 shared task is in effect. As defined in the previous iteration of the task, lexical normalization is: The task of transforming an utterance into its standard form, word by word, including both one-to-many (1-n) and many-to-one (n-1) replacements. This time, the focus is on non-Indo-European languages. We have manged to obtain (new) datasets for: Thai, Vietnamese, Indonesian, Japanese, and Korean. More information can be found on: https://noisy-text.github.io/2025/multi-lexnorm.html# Deadlines: Data available: Nov 15, 2024 Data freeze: Jan 14, 2025 Test data: Jan 25, 2025 Final Evaluation: Feb 07, 2025 Paper deadline: Feb 25, 2025 Paper reviewed: Mar 01, 2025 Camera ready: Mar 10, 2025 Workshop: May 03, 2025 (TBD) Best, The organizers

1 0

Call for Participation: CHiPSAL-COLING-2025
by Kengatharaiyar Sarveswaran 13 Jan '25

13 Jan '25

*Call for Participation* *First workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)Co-located with the 31st International Conference on Computational Linguistics (COLING 2025)* *Virtual* *January 19, 2025 8.30 AM - 3.00 PM (GMT +4)* *Accepted papers - *https://sites.google.com/view/chipsal/accepted-papers *W**orkshop program** -* https://sites.google.com/view/chipsal/workshop-program *Workshop Website - https://sites.google.com/view/chipsal/ <https://sites.google.com/view/chipsal/>* Please join us! We are excited to engage with the research community in advancing NLP for South Asian languages and fostering meaningful collaborations. *Why CHiPSAL?* South Asia, with over 1.97 billion people, is one of the most linguistically diverse regions globally, home to 700+ languages and 25+ major scripts. This region is rich in cultural and linguistic heritage but faces significant challenges in natural language processing (NLP). These include encoding and orthographic issues, resource constraints, linguistic complexities, dialectal diversity, and more. *CHiPSAL* addresses these challenges and advances NLP research for South Asian languages while fostering collaborations across linguistic, technical, and cultural domains. *Organizing Chairs:* Kengatharaiyer Sarveswaran, University of Jaffna, Jaffna, Sri Lanka Ashwini Vaidya, Indian Institute of Technology, Delhi, India Bal Krishna Bal, Kathmandu University, Kathmandu, Nepal Sana Shams, University of Engineering and Technology, Lahore, Pakistan Surendrabikram Thapa, Virginia Tech, USA *Program Committee Members (alphabetical order):*A M Abirami, Thiagarajar College of Engineering, India. Abhai Pratap Singh, Amazon, USA. Akaash Vishal Hazarika, Splunk, USA. Aloka Fernando, University of Moratuwa, Sri Lanka. Aman Shakya,Institute of Engineering, Pulchowk, Tribhuvan University, Nepal. Anitha Dhakshina Moorthy, Thiagarajar College of Engineering, India. Ann Sinthusha Anton Vijeevaraj, University of Vavuniya, Sri Lanka. Annette Hautli-Janisz, University of Passau, Germany. Ashwini Vaidya, IIT Delhi, India. Bal Krishna Bal, Kathmandu University, Nepal. Balaram Prasain, Tribhuvan University, Nepal. Bareera Sadia, Al-Khawarizmi Institute of Computer Science, UET, Lahore Pakistan. Brinda Gurusamy, Cisco, USA. Buddhika Karunarathne, University of Moratuwa, Sri Lanka. Eugene Y A Charles, University of Jaffna, Sri Lanka. Farah Adeeba, University of Engineering and Technology, KSK, Pakistan. Farhan Jafri, Jamia Millia Islamia, India. Gihan Dias, University of Moratuwa, Sri Lanka. H N D Thilini, University of Colombo School of Computing, Sri Lanka. Hariram Veeramani, UCLA, USA. Hassan Sajjad, Dalhousie University, Canada. Jayeeta Putatunda, Fitch Ratings, USA. Kengatharaiyer Sarveswaran, University of Jaffna, Sri Lanka. Krishna Chalise, Tribhuvan University, Nepal. Kritesh Rauniyar, IIMS College, Nepal. Lekhnath Pathak, Tribhuvan University, Nepal. Lynnette Hui Xian Ng, CMU, USA. Mahak Shah, Columbia University, USA. Manjunath Chandrashekaraiah, Astera Labs, USA. Menan Velayuthan, University of Moratuwa, Sri Lanka. Munief Tahir, Al-Khawarizmi Institute of Computer Science, UET, Lahore Pakistan. Parameswari Krishnamurthy,IIIT Hyderabad, India. Paritosh Katre, PayPal, USA. Prakash Poudyal, Kathmandu University, Nepal. Preetish Kakkar, Adobe, USA. Qurat-ul-Ain Akram, University of Engineering and Technology, KSK, Pakistan. Randil Pushpananda, University of Colombo School of Computing, Sri Lanka. Sahar Rauf, Al-Khawarizmi Institute of Computer Science, UET, Lahore Pakistan. Sana Shams, Al-Khawarizmi Institute of Computer Science, UET, Lahore Pakistan. Shuvam Shiwakoti, Virginia Tech, USA. Siddhant Bikram Shah, Northeastern University, USA. Sinnathamby Mahesan, University of Jaffna, Sri Lanka. Suganya Ramamoorthy, Vellore Institute of Technology University, India. Surabhi Adhikari, Columbia University, USA. Surangika Ranathunga, Massey University, New Zealand. Surendrabikram Thapa, Virginia Tech, USA. Tafseer Ahmed, Alexa Translations, Canada. Toqeer Ehsan, Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates. Usman Naseem, Macquarie University, Australia. Uthayasanker Thayasivam, University of Moratuwa, Sri Lanka. Vijayrajsinh Gohil, New York University, USA. *Volunteers (alphabetical order):* Ahrane Mahaganapathy, University of Jaffna, Sri Lanka. Menan Velayuthan, University of Moratuwa, Sri Lanka. Suthakar Sivashanth, University of Jaffna, Sri Lanka. Thank you -- *Dr Kengatharaiyer Sarveswaran (Sarves)* Senior Lecturer (Grade-I) in Computer Science Department of Computer Science Faculty of Science University of Jaffna Sri Lanka sarves.github.io

1 0

Call for participation: WACL4 at COLING’2025 - Registration and Programme
by Amal Haddad 13 Jan '25

13 Jan '25

The 4th Workshop on Arabic Corpus Linguistics (WACL-4) [1] WACL4 AT COLING’2025 WITH FOCUS ON ARABIC DIALECTS The field of Arabic language research using corpora and corpus methods has experienced significant growth and development in recent years. What once were isolated efforts have now transformed into a vibrant and expansive area of study, advancing rapidly across multiple dimensions in both corpus and computational linguistics. Building upon the success of previous editions--WACL-1 in 2011, WACL-2 in 2013 in conjunction with the Corpus Linguistics Conference at Lancaster University, and WACL-3 in 2019 at the Corpus Linguistics 2019 conference at Cardiff University--we are excited to announce the fourth edition of the Workshop on Arabic Corpus Linguistics (WACL-4). The primary objectives of WACL-4 are to highlight the latest developments in the creation, annotation, and application of Arabic corpora, including the introduction of new corpora and advancements in annotation techniques, while fostering collaboration among researchers from diverse institutions and regions to stimulate joint research projects and interdisciplinary initiatives. This edition will place a special emphasis on the study of Arabic dialects, including non-standard and regional varieties, to broaden the understanding of Arabic in its various manifestations and support research on under-resourced linguistic varieties. Additionally, WACL-4 aims to encourage the development and refinement of Natural Language Processing (NLP) systems and tools tailored for Arabic, integrating corpora into NLP workflows, creating new computational tools, and evaluating existing systems to improve their efficacy in processing Arabic text. The workshop will be held online on January 20th, 2025 in conjunction with the 31st edition of COLING in 2025 in Abu Dhabi (UAE). We are pleased to share the programme of WACL4 2025 with you. Please visit: https://drive.google.com/file/d/1SSNC1r4dx023cb_FuQWhWa8d3Si4fvvp/view?usp=… To register for the workshop, please visit https://coling2025.org/registration/ We are looking forward to welcoming you at WACL4 at COLING'2025 Kind regards, WACL4 Organising Committee -- Amal Haddad Haddad (She/her) Facultad de Traducción e Interpretación Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad Lexicon Research Group |http://lexicon.ugr.es/haddad Co-Convenor, BAAL SIG 'Humans, Machines, Language'|https://r.jyu.fi/humala Event Coordinator, BAAL SIG 'Language, Learning and Teaching' =============== Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it" =============== Links: ------ [1] https://wp.lancs.ac.uk/wacl4

1 0

Reminder: Call for participation in Web Survey on Data Bottlenecks in Supervised NLP
by Romberg, Julia 13 Jan '25

13 Jan '25

++ 1st reminder to participate in our web survey on data annotation bottlenecks and active learning; apologies for cross-posting ++ Dear list members, We invite you to participate in our web survey exploring how recent advancements in NLP, such as LLMs, have changed the need for labeled data in Supervised Machine Learning. Survey details: * Topic: Web survey on Data Annotation and Active Learning * Target group: Researchers and practitioners alike in the fields of NLP, Supervised Machine Learning, and Active Learning in particular (knowledge of Active Learning is not required) * Duration: 5-15 minutes * Deadline for participation: January 12, 2025 * Survey link: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271 Why should I invest my time in this survey? * Make an impact: Participate in a community-effort and help to gain a better understanding of the current state and open issues on methods that are used to overcome a lack of labeled data. * Gain insights: Receive a report with key findings to incorporate these insights into research and development of new methods and technologies. Thank you for considering participating in our survey! If you have any questions or require additional information, please don't hesitate to contact us directly at activelearningsurvey2024(a)gmail.com<mailto:activeLearningSurvey2024@gmail.com>. If you know colleagues or peers who might be interested, we'd be grateful if you could forward this survey to them as well. Best regards, Julia Romberg (GESIS - Leibniz Institute for the Social Sciences, Germany) Christopher Schröder (Institut für Angewandte Informatik e. V., Germany) Julius Gonsior (TUD Dresden University of Technology) ------------------------------------------------------------------------ [gesis-logo-new-50-50] Leibniz Institute for the Social Sciences Julia Romberg Computational Social Science, Team Data Science Methods +49(221)47694-742

1 1

PhD studentship on analysis of multimodal videos/transcripts of children's interactions
by Colin Bannard 13 Jan '25

13 Jan '25

Dear colleagues Please forward this email to anyone you think might be interested in a PhD studentship focused on discovering drivers of child language development from videos/multimodal transcripts of early child-parent/educator interactions: https://www.findaphd.com/phds/project/identifying-drivers-of-language-devel… The student will be based at the University of Manchester in the UK, but will spend at least 12 months at the University of Melbourne, Australia. Best, Colin

1 0

Call for Participation: The First Workshop on Language Models for Low-Resource Languages (LoResLM 2025@COLING)
by Ranasinghe, Tharindu 13 Jan '25

13 Jan '25

Neural language models have revolutionised natural language processing (NLP) and have provided state-of-the-art results for many tasks. However, their effectiveness is largely dependent on the pre-training resources. Therefore, language models (LMs) often struggle with low-resource languages in both training and evaluation. Recently, there has been a growing trend in developing and adopting LMs for low-resource languages. LoResLM aims to provide a forum for researchers to share and discuss their ongoing work on LMs for low-resource languages. LoResLM 2025 will be a physical workshop co-located with COLING 2025, Abu Dhabi on 20th January 2025. We are pleased to share the programme of LoResLM 2025 with you. Please visit https://loreslm.github.io/program for the full programme. To register for the workshop, please visit https://coling2025.org/registration/ We are looking forward to welcoming you at LoResLM 2025 in Abu Dhabi. The workshop is supported in part by CLARIN-UK, funded by the Arts and Humanities Research Council as part of the Infrastructure for Digital Arts and Humanities programme. >> Keynote Speaker Jose Camacho-Collados, Cardiff University. Title - "Multilinguality and Cultural Awareness in Language Models" >> Organising Committee Hansi Hettiarachchi, Lancaster University, UK Tharindu Ranasinghe, Lancaster University, UK Paul Rayson, Lancaster University, UK Ruslan Mitkov, Lancaster University, UK Mohamed Gaber, Birmingham City University, UK Damith Premasiri, Lancaster University, UK Fiona Anting Tan, National University of Singapore, Singapore Lasitha Uyangodage, University of Münster, Germany >> Programme Committee Gábor Bella - IMT Atlantique, France Samuel Cahyawijaya - The Hong Kong University of Science and Technology, Hong Kong Burcu Can - University of Stirling, UK Çağrı Çöltekin - University of Tübingen, Germany Raj Dabre - National Institute of Information and Communications Technology, Japan Vera Danilova - Uppsala University, Sweden Debashish Das - Birmingham City University, UK Ona de Gibert - University of Helsinki, Finland Alphaeus Dmonte - George Mason University, USA Bonaventure F. P. Dossou - McGill University, Canada Daan van Esch - Google Ignatius Ezeani - Lancaster University, UK Anna Furtado - University of Galway, Ireland Amal Htait - Aston University, UK Ali Hürriyetoğlu - Wageningen University & Research, Netherlands Danka Jokic - University of Belgrade, Serbia Diptesh Kanojia - University of Surrey, UK Daisy Lal - Lancaster University, UK Colin Leong - University of Dayton, USA Veronika Lipp - Hungarian Research Centre for Linguistics, Hungary Muhidin Mohamed - Aston University, UK Farhad Nooralahzadeh - University of Zurich, Switzerland Rrubaa Panchendrarajan - Queen Mary University of London, UK Nadeesha Pathirana - Aston University, UK Alistair Plum - University of Luxembourg, Luxembourg Nishat Raihan - George Mason University, USA Omid Rohanian - University of Oxford, UK Sandaru Seneviratne - Australian National University, Australia Ravi Shekhar - University of Essex, UK Archchana Sindhujan - University of Surrey, UK Claytone Sikasote - University of Cape Town, South Africa Marjana Prifti Skenduli - University of New York Tirana, Albania Uthayasanker Thayasivam - University of Moratuwa, Sri Lanka Taro Watanabe - Nara Institute of Science and Technology, Japan Edlira Vakaj - Birmingham City University, UK John Vidler - Lancaster University, UK Phil Weber - Aston University, UK Bryan Wilie - Hong Kong University of Science & Technology, Hong Kong Artūrs Znotiņš - University of Latvia, Latvia URL - https://loreslm.github.io/ Twitter - https://x.com/LoResLM2025 LinkedIn - https://www.linkedin.com/company/loreslm/

1 0

PhD offers at Telecom Paris, Institut Polytechnique de Paris
by Nils Holzenberger 13 Jan '25

13 Jan '25

Hello, We are hiring 2 PhD students to work on combining language models with structured data, starting from September 2025, at Telecom Paris, Institut Polytechnique de Paris. Large Language Models are amazing, and with our research project, we aim to make them even more amazing! Our project will connect large language models to structured knowledge such as knowledge bases or databases. With this, 1. language models will stop hallucinating 2. language models can be audited and updated reliably 3. language models will become smaller and thus more eco-friendly and deployable We work in the DIG team at Telecom Paris, one of the finest engineering schools in France, and part of Institute Polytechnique de Paris — ranked 38th in the world by the QS ranking. The institute is 45 min away from Paris by public transport, and located in the green of the Plateau de Saclay. Excited about joining us? Tick these boxes: 1. Have a good background in natural language processing, machine learning, and knowledge representation 2. Have a master's degree (or equivalent) 3. Be of European nationality (imposed by our sponsor, the French Ministry of Armed Forces) Check out our Web site to apply: https://suchanek.name/work/research/kb-lm/index.html Fabian Suchanek & Nils Holzenberger

1 0

EURALEX Talks
by Iztok Kosem 13 Jan '25

13 Jan '25

Dear all, The end of 2024 has been very active at EURALEX, for example you can now find EURALEX 2024 proceedings on the website (they have already been indexed by SCOPUS), and the videorecordings from the 2024 Congress (presentations and pre-conference workshops) have been made available at the Videolectures website (https://videolectures.net/events/euralex2024_cavtat). We are now pleased to announce the launch of our new webinar series. EURALEX Talks is a series of online webinars featuring invited experts in the field of lexicography. These sessions are free and open to everyone. They explore a wide variety of topics related to language and lexicography. Each talk lasts approximately 40 minutes, followed by questions and discussion. Join us on Tuesday 28 January 2025 at 16.00 (CET) for our first talk, which will be given by Pamela Faber. Zoom link: https://uni-lj-si.zoom.us/j/8569694820. The Language of Love Fraud: Frames of Deception The language of love fraud is a unique example of an online linguistic deception. Using a fabricated identity, the fraudster creates the illusion of a romantic relationship between himself and the victim, solely through language. This deception is often successful because of the fraudster’s lexical choices (soulmate, cherish, adore, sacred vow, etc.) which override his flawed syntax and activate a frame of romantic love in her mind. Biodata Pamela Faber is Professor Emeritus in Translation and Interpreting at the University of Granada (Spain). She is the founder of the LexiCon research group, with whom she has carried out various nationally-funded research projects on Frame-Based Terminology, the approach to terminology that she created and developed. One of the results of these projects is EcoLexicon (ecolexicon.ugr.es), a terminological knowledge base on environmental science. She has more than 150 articles, book chapters, and books, which have inspired researchers throughout the world to explore specialized knowledge from a frame-based perspective. Looking forward to seeing you online. Please forward the announcement to other mailing lists and colleagues who might be interested. Best wishes Iztok Kosem EURALEX President

1 0

Second Call for Participation- IWSLT 2025
by Atul K. Ojha 13 Jan '25

13 Jan '25

Apologies for cross-posting. ---------------------------------------- *The International Conference on Spoken Language Translation* ACL – 22nd* IWSLT 2025 – **S**econd** Call for Participation* *31 July-1 August 2025 - Vienna, Austria* http://iwslt.org The International Conference on Spoken Language Translation (IWSLT) <https://iwslt.org/> is the premier annual conference for all aspects of Spoken Language Translation. Every year, the conference organises and sponsors open evaluation campaigns around key challenges in simultaneous and consecutive translation, under real-time/low latency or offline conditions and under low-resource or multilingual constraints. System descriptions and results from participants’ systems and scientific papers related to key algorithmic advances and best practices are presented. IWSLT is the venue of the SIGSLTs <https://iwslt.org/sigslt/>, the Special Interest Group on Spoken Language Translation <https://iwslt.org/sigslt/> of ACL <https://www.aclweb.org/portal/>, ISCA <https://www.isca-speech.org/> and ELRA <https://www.elra.info/>. With a track record of 21 years, IWSLT benchmarks and proceedings serve as reference for all researchers and practitioners working on speech translation and related fields. The 22nd edition of IWSLT will be run as a hybrid ELRA <https://www.elra.info/>/ACL <https://www.aclweb.org/portal/> event, co-located with ACL 2025 <https://2025.aclweb.org/> from 31 July to 1 August 2025. *Important Dates* *January 1, 2025*: Release of shared task training and dev data *March 15, 2025*: Scientific paper submission deadline *Apr 1-15, 2025*: Evaluation period *April 21, 2025*: System description paper submission deadline *May 15, 2025*: Notification of acceptance *June 1, 2025*: Camera-ready deadline (all paper) *July 31-Aug 1*, *2025*: IWSLT conference Evaluation The IWSLT 2025 features shared tasks <https://iwslt.org/2025/#shared-tasks> that address the following focus areas: - High-resource ST: Offline track, Simultaneous track, Subtitling track - Low-resource ST: Low-resource and Indic (multilingual) tracks - Instruction-following Speech Processing track: Technical domain ST, ASR, Summarization, and QA Training and development data for each shared task will be prepared and released by the respective organisers (for further information on this initiative, please refer to the IWSLT website <https://iwslt.org/2025/>). Participants will receive instructions about how to submit their runs. In addition, participants have the opportunity to present their work through a system paper that will be published in the ACL Proceedings. Conference IWSLT also invites submissions of scientific papers to be published in the ACL Proceedings and presented either in oral or poster format. The conference selects high-quality, original contributions on theoretical and practical issues of spoken language translation research, technologies and applications. Submissions will be accepted directly through the IWSLT submission site (to be announced on the website <https://iwslt.org/2025/>). We will also accept commitments of submissions with reviews from the ACL Rolling Review. Additionally, to foster cross-pollination of ideas, the conference also invites the presentation of papers on speech translation recently published elsewhere. Please note that this is for non-archival presentation of papers relevant to speech translation already published in other venues (e.g., Findings for the *ACL, speech, NLP or MT conferences). Submissions for this category will be accepted through a dedicated form (to be announced on the website <https://iwslt.org/2025/>). Papers will be checked for relevance to IWSLT, and assigned either oral or poster presentation slots if selected. Contact Please email iwslt-evaluation-campaign(a)googlegroups.com if you have any questions related to the shared tasks. Thanks, Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul (IWSLT organisers)

1 0

First CFP: LoResMT 2025 at NAACL 2025
by Atul K. Ojha 13 Jan '25

13 Jan '25

Apologies for cross-posting. --------------------------------------------------------------------------- *The Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)* *https://www.loresmt.org/ <https://www.loresmt.org/>* *@ NAACL 2025 (May 3–4, 2025)* *Albuquerque, New Mexico, U.S.A.* *SUBMISSION* * <https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/LoResMT>https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT>* *TIMELINE* *Paper submission due:* January 30, 2025 (Anywhere on Earth) *Pre-reviewed (ARR) submission deadline:* February 20, 2025 *Notification of acceptance:* March 1, 2025 *Camera-ready papers due:* March 10, 2025 (Anywhere on Earth) *Pre-recorded video due (hard deadline):* April 8, 2025 *Workshop dates at NAACL 2025:* May 3–4, 2025 *SCOPE* Based on the success of past low-resource machine translation (MT) workshops at AMTA 2018, MT Summit 2019, AACL-IJCNLP 2020, AMTA 2021, COLING 2022, EACL 2023, ACL 2024, we introduce LoResMT 2025 workshop at NAACL 2025. The workshop provides a discussion panel for researchers working on MT systems/methods for low-resource and under-represented languages in general. We would like to help review/overview the state of MT for low-resource languages and define the most important directions. We also solicit papers dedicated to supplementary NLP tools that are used in any language and especially in low-resource languages. Overview papers of these NLP tools are very welcome. It will be beneficial if the evaluations of these tools in research papers include their impact on the quality of MT output. *TOPICS* We are highly interested in (1) original research papers, (2) review/opinion papers, and (3) online systems on the topics below; however, we welcome all novel ideas that cover research on low-resource languages. - Neural machine translation (NMT) for low-resource languages - Use of LLMs (large language models) for low-resource MT systems - COVID-related corpora, their translations and corresponding NLP/MT systems - Work that presents online systems for practical use by native speakers - Word tokenizers/de-tokenizers for specific languages - Word/morpheme segmenters for specific languages - Alignment/Re-ordering tools for specific language pairs - Use of morphology analyzers and/or morpheme segmenters in MT - Multilingual/cross-lingual NLP tools for MT - Corpora creation and curation technologies for low-resource languages - Review of available parallel corpora for low-resource languages - Research and review papers on MT methods for low-resource languages - MT systems/methods (e.g. rule-based, SMT, NMT) for low-resource languages - Pivot MT for low-resource languages - Zero-shot MT for low-resource languages - Fast building of MT systems for low-resource languages - Re-usability of existing MT systems for low-resource languages - Machine translation for language preservation *SUBMISSION INFORMATION* We are soliciting two types of submissions: (1) research, review, and position papers and (2) system demonstration papers. For research, review and position papers, the length of each paper should be at least four (4) and not exceed eight (8) pages, plus unlimited pages for references. For system demonstration papers, the limit is four (4) pages. Submissions should be formatted according to the official ACL style templates (Overleaf). Please refer to the NAACL submission guideline for further information <https://2025.naacl.org/calls/papers/#paper-submission-details>. Accepted papers will be published at ACL Anthology in the NAACL 2025 and will be presented at the conference. Submissions must be anonymized and should be done using the provided submission system. Scientific papers that have been or will be submitted to other venues must be declared as such and must be withdrawn from the other venues if accepted and published at LoResMT. The review will be double-blind. Authors of an accepted paper should present their paper in person at NAACL 2025. Papers should be submitted in PDF to the LoResMT Open Review <https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/LoResMT>. We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided. Registration is handled by the main conference (https://2025.naacl.org/). *ORGANIZING COMMITTEE (LISTED ALPHABETICALLY)* Atul Kr. Ojha, University of Galway Chao-Hong Liu, Potamu Research Ltd Ekaterina Vylomova, University of Melbourne, Australia Jade Abbott, Retro Rabbit Jonathan Washington, Swarthmore College Nathaniel Oco, National University (Philippines) Tommi A Pirinen, UiT The Arctic University of Norway, Tromsø Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University Varvara Logacheva, Skolkovo Institute of Science and Technology Xiaobing Zhao, Minzu University of China *PROGRAM COMMITTEE (LISTED ALPHABETICALLY)* Abigail Walsh, ADAPT Centre, Dublin City University, Ireland Alberto Poncelas, Rakuten, Singapore Ali Hatami, University of Galway Alina Karakanta, Fondazione Bruno Kessler (FBK), University of Trento Anna Currey, AWS AI Labs Aswarth Abhilash Dara, Walmart Global Technology Atul Kr. Ojha, University of Galway & Panlingua Language Processing LLP Bogdan Babych, Heidelberg University Chao-hong Liu, Potamu Research Ltd Constantine Lignos, Brandeis University, USA Daan van Esch, Google Dana Moukheiber, Massachusetts Institute of Technology Ekaterina Vylomova, University of Melbourne, Australia Eleni Metheniti, CLLE-CNRS and IRIT-CNRS Flammie Pirinen, UiT Norgga árktalaš universitehta Gaurav Negi, University of Galway Jinliang Lu, Institute of automation, Chinese Academy of Sciences John Philip McCrae, University of Galway Jonathan Washington, Swarthmore College Koel Dutta Chowdhury, Saarland University Majid Latifi, UPC University Maria Art Antonette Clariño, University of the Philippines Los Baños Milind Agarwal, George Mason University Mathias Müller, University of Zurich Nathaniel Oco, De La Salle University Pavel Rychlý, Masaryk University and Lexical Computing Pengwei Li, Meta Rashid Ahmad, International Institute of Information Technology, Hyderabad Rico Sennrich, University of Zurich Santanu Pal, Wipro Sangjee Dondrub, Qinghai Normal University Sardana Ivanova, University of Helsinki Sourabrata Mukherjee, Charles University Thepchai Supnithi, National Electronics and Computer Technology Center Timothee Mickus, University of Helsinki Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University Wen Lai, LMU Munich Xuebo Liu, Harbin Institute of Technolgy, Shenzhen Yalemisew Abgaz, Dublin City University Yasmin Moslem, Bering Lab Zhanibek Kozhirbayev, National Laboratory Astana, Nazarbayev University *CONTACT* Please email loresmt(a)googlegroups.com if you have any questions/comments/suggestions.

1 1

IWCS 2025: Second Call for Workshop Proposals
by Kilian Evang 10 Jan '25

10 Jan '25

Second Call for Workshop Proposals Deadline: Jan 31 16th International Conference on Computational Semantics (IWCS) Heinrich Heine University Düsseldorf, Germany 22-24 September 2025 https://iwcs2025.github.io/ IWCS is a biennial conference on computational semantics. This year's edition is organized by Heinrich Heine University Düsseldorf. The conference is endorsed by SIGSEM, the ACL Special Interest Group on Computational Semantics. The aim of IWCS is to bring together researchers interested in any aspects of the computation, annotation, extraction, representation, and learning of meaning in natural language, whether this is from a lexical or structural semantic perspective. IWCS embraces both symbolic and machine learning approaches to computational semantics, and everything in between. The conference and workshops will take place 22-24 September 2025. === WORKSHOP PROPOSALS === We invite proposals for workshops to be held in conjunction with IWCS 2025. Accepted workshops will have the option to publish their proceedings in the ACL Anthology. We solicit proposals in all areas of computational semantics, in other words all computational aspects of meaning of natural language within written, spoken, signed, or multi-modal communication. Workshops are invited on these closely related areas, including the following: * design of meaning representations * syntax-semantics interface * representing and resolving semantic ambiguity * shallow and deep semantic processing and reasoning * hybrid symbolic and statistical approaches to semantics * distributional semantics * alternative approaches to compositional semantics * inference methods for computational semantics * recognizing textual entailment * learning by reading * methodologies and practices for semantic annotation * machine learning of semantic structures * probabilistic computational semantics * neural semantic parsing * computing meaning with large language models * computational aspects of lexical semantics * semantics and ontologies * semantic web and natural language processing * semantic aspects of language generation * generating from meaning representations * semantic relations in discourse and dialogue * semantics and pragmatics of dialogue acts * multimodal and grounded approaches to computing meaning * semantics-pragmatics interface * applications of computational semantics === FINANCES === Workshops must cover their own costs for invited speakers as well as organizers' traveling costs. === SUBMISSION INFORMATION === Proposals for workshops should contain: * A title and brief (max two pages) description of the workshop topic and content; * The names, affiliation and email addresses of the organisers; * An estimate of the expected audience size; * If the workshop has been held before, a note specifying where previous workshops were held, how many submissions the workshop received, how many papers were accepted and how many attendees the workshop attracted; * Whether you plan a half-day or full-day workshop; * Whether or not the workshop proceedings should be published in the ACL Anthology. Proposals should be submitted on OpenReview: https://openreview.net/group?id=aclweb.org/SIGSEM/IWCS/2025/Workshop_Propos… The person submitting the proposal will need an OpenReview account. Please note OpenReview's moderation policy, where newly created accounts with an institutional email address are approved automatically, but other email addresses can take up to two weeks to approve. === IMPORTANT DATES === 31 January 2025 Workshop proposal submissions due 07 February 2025 Workshop proposal notification of acceptance 24 September 2025 Workshop date === CONTACT === For questions, contact: iwcs2025-program-chairs(a)uni-duesseldorf.de Kilian Evang, Laura Kallmeyer, Sylvain Pogodalla (the IWCS 2025 program chairs) -- Dr. Kilian Evang · Institut für Linguistik · Heinrich-Heine-Universität Düsseldorf Universitätsstr. 1 · 40225 Düsseldorf, Germany · https://kilian.evang.name

1 0

CFP: 'Humans, Machines, Language', University of Granada, Spain, 24-25 June 2025
by Amal Haddad 09 Jan '25

09 Jan '25

Humans, Machines, Language Annual conference University of Granada, Spain 24-25 June 2025 https://sites.google.com/view/humans-machines-language/events/2025-conferen… We welcome everyone interested in the impact of new and emerging language technologies that integrate with human senses. Whether you are a tech developer who wants to learn more about linguistics, or a linguist who wants to know more about tech, we want to hear from you! HuMaLa leads on from the COST Action 'Language In The Human-Machine Era' (https://lithme.eu [1]); you can find out more about our core themes of interest from the LITHME forecast report (https://doi.org/10.17011/jyx/reports/20210518/1 [2]) and animations (https://lithme.eu/animations [3]). HuMaLa's inaugural conference will be held at the University of Granada, Spain, on 24-25 June 2025. The conference theme is: 'Humanistic insights for human-machine language technologies: privacy, security, and wellbeing' This echoes the priorities of the EU's recently introduced AI Act: "human oversight, safety, privacy, transparency, non-discrimination and social and environmental wellbeing" (https://www.europarl.europa.eu/news/en/press-room/20230609IPR96212/ [4]). We hope to explore these timely topics from a range of humanistic perspectives, with a focus on human-machine language technologies. We welcome researchers and developers from computer science, linguistics, sociology, education, and more. To understand the more general scope of the conference, again, see the LITHME forecast report [2] and animations [3]. In addition to technical work (e.g., model description or dataset), we also welcome theoretical and empirical studies on the ethical, legal, cultural and social implications of language technology adoption across these domains. Presentation format: Talk (20 mins) or Poster, non-archival Presentations can address any of the topics that fall within the interests of HuMaLa. Selection for places will be made by the conference scientific committee. We encourage early career applicants to read a guide on abstract writing, for example: https://info.lse.ac.uk/current-students/student-futures/how-to-write-an-abs… [5]. Senior colleagues are used to all this, and are therefore at a somewhat unfair advantage. We hope the above guide (and others like it) will help early career applicants to craft their abstract more precisely. Abstract submission deadline: Friday 31 January 2025, 12:00 (noon) GMT Website: https://sites.google.com/view/humans-machines-language/events/2025-conferen… We are looking forward to seeing you in Granada. Links: ------ [1] https://lithme.eu/ [2] https://doi.org/10.17011/jyx/reports/20210518/1 [3] https://lithme.eu/animations [4] https://www.europarl.europa.eu/news/en/press-room/20230609IPR96212/ [5] https://info.lse.ac.uk/current-students/student-futures/how-to-write-an-abs…

1 0

GermEval 2025 Shared Task on Candy Speech Detection: Call for Participation
by michael.wiegand＠univie.ac.at 09 Jan '25

09 Jan '25

apologies for cross-posting We are pleased to announce the *GermEval Shared Task on Candy Speech Detection („Flausch-Erkennung“)* This is the first call to participate in the shared task on candy speech detection („Flausch-Erkennung“). We invite everyone from academia and industry to participate in the shared task. The workshop discussing the results of this shared task is planned to be held in conjunction with the Conference on Natural Language Processing (KONVENS) in September 2025. *Introduction* Numerous methods have been developed for detecting and censoring negative speech (e.g., hate speech or offensive or harmful language) on social media platforms. However, there is much less focus on identifying and promoting positive supportive discourse in online communities. Our shared task aims to address this gap and encourage researchers to focus on such positive expressions. The task is to identify expressions of candy speech (Flausch) in online posts (YouTube comments). We define candy speech as an expression of positive attitudes in social media toward individuals or their output (videos, comments, etc.). The purpose of candy speech is to encourage, cheer up, support and empower others. It can be viewed as the counterpart to hate speech, as it also aims to influence the self-image of the target person or group, but in a positive way. *Data* We will provide the participants with annotated training (and development) and unlabeled test datasets containing complete written, German language comment threads under YouTube videos posted by different content creators. The content creators and communities vary in topic, style, age group, etc. The test data and training data do not overlap wrt. to the original content creator of the video – the communities commenting on the videos can therefore be expected to differ. *Task Details* Candy speech detection is the task of identifying the presence of candy speech (at the span level) in a given YouTube comment thread and classifying each expression in one of the predefined categories. This shared task focuses on German speaking YouTube communities. Participants will be provided with a dataset of YouTube comments manually annotated for different types of candy speech expressions. We offer the following two subtasks. Participants in this year's shared task may choose to participate in either subtask: Subtask 1: Coarse-Grained Classification The goal of this subtask is to identify whether the given comment contains candy speech ("Flausch") or not. The dataset is manually annotated for the presence of candy speech. Subtask 2: Fine-Grained Classification The goal of this subtask is to identify the span of each candy speech expression in a given text and classify it in one of the predefined categories. The dataset is manually annotated for 10 different types of candy speech expressions, such as “positive feedback”, “compliment”, “group membership” etc. More details on the subtasks (including examples) can be found at the website of the shared task (see link below). *Important dates* Trial data available: February 15, 2025 Training data available: March 3, 2025 Test data available: May 17, 2025 Evaluation start: June 16, 2025 Evaluation end: June 27, 2025 Paper submission due: July 11, 2025 Camera ready due: August 15, 2025 GermEval workshop: September 8 or 12, 2025 (co-located with KONVENS) *Website* https://yuliacl.github.io/GermEval2025-Flausch-Erkennung/ *GermEval* GermEval is a series of shared task evaluation campaigns that focus on Natural Language Processing for the German language. GermEval has been conducted regularly since 2014 in co-location with KONVENS/GSCL conferences: https://germeval.github.io/tasks/ *contact email* Please send any enquiry to the following email address: germeval-2025-candy-speech(a)ruhr-uni-bochum.de Best regards, Yulia Clausen, Ruhr-Universität Bochum, Germany Tatjana Scheffler, Ruhr-Universität Bochum, Germany Michael Wiegand, Universität Wien, Austria

1 0

3rd CFP: Second Workshop on Patient-Oriented Language Processing (CL4Health) @ NAACL 2025
by Paul Thompson 09 Jan '25

09 Jan '25

Second Workshop on Patient-Oriented Language Processing (CL4Health) @ NAACL 2025 https://bionlp.nlm.nih.gov/cl4health2025/ Albuquerque, New Mexico, USA SCOPE CL4Health fills the gap among the different biomedical language processing workshops by providing a general venue for a broad spectrum of patient-oriented language processing research. The second workshop on patient-oriented language processing follows the successful inaugural CL4Health workshop (co-located with LREC-COLING 2024), which clearly demonstrated the need for a computational linguistics venue that focuses on language related to health of the public. CL4Health is concerned with the resources, computational approaches, and behavioral and socio-economic aspects of the public interactions with digital resources in search of health-related information that satisfies their information needs and guides their actions. The workshop invites papers concerning all areas of language processing focused on patients' health and health-related issues concerning the public. The issues include, but are not limited to accessibility and trustworthiness of health information provided to the public; explainable and evidence-supported answers to consumer-health questions; accurate summarization of patients' health records at their health-literacy level; understanding patients' non-informational needs through their language, and accurate and accessible interpretations of biomedical research. The topics of interest for the workshop include but are not limited to the following: * Health-related information needs and online behaviors of the public; * Quality assurance and ethics considerations in language technologies and approaches applied to text and other modalities for public consumption; * Summarization of data from electronic health records for patients; * Detection of misinformation in consumer health-related resources and mitigation of potential harms; * Consumer health question answering (Community Question Answering)(CQA); * Biomedical text simplification/adaptation; * Dialogue systems to support patients' interactions with clinicians, healthcare systems, and online resources; * Linguistic resources, data and tools for language technologies focusing on consumer health; * Infrastructures and pre-trained language models for consumer health SHARED TASK Perspective-aware Healthcare Answer Summarization (PerAnsSumm) will be co-located with the workshop. In community / consumer health question answering, several aspects, such as question understanding and answer generation, have been studied for over a decade. A new and important question posed by this task is the different perspectives provided in the answers to questions posted to online forums. The responses to the questions offer different answer perspectives, e.g., personal experiences, factual information, and suggestions. Traditionally, the CQA answer summarization task has focused on a single best-voted answer as a reference summary. A single answer does not capture all the perspectives. Moreover, a structured presentation of the information in the form of perspective-specific summaries may be more useful for the end-users. To address these gaps, this challenge introduces a novel perspective-specific answer summarization task within a CQA setup. The task will use the Perspective-aware healthcare Answer SuMmarizAtion (PUMA) dataset, a corpus of medical question-answer pairs created by the task organizers. The PUMA dataset consists of 3,167 CQA threads with approximately 10K answers filtered from the Yahoo! L6 corpus. Each answer in PUMA is annotated with five perspective spans: ‘cause’, ‘suggestion’, ‘experience’, ‘question’, and ‘information’. Further details are about the shared task are available at: https://peranssumm.github.io/ IMPORTANT DATES (Tentative) January 30, 2025 -Workshop Paper Due Date️ March 1, 2025 - Notification of acceptance March 10, 2025 - Camera-ready papers due April 8, 2025 - Pre-recorded video due (hard deadline) May 3 OR 4, 2025 - Workshop SUBMISSIONS Two types of submissions are invited: - Full papers: should not exceed eight (8) pages of text, plus unlimited references. These are intended to be reports of original research. - Short papers: may consist of up to four (4) pages of content, plus unlimited references. Appropriate short paper topics include preliminary results, application notes, descriptions of work in progress, etc. Electronic Submission: Submissions must be electronic and in PDF format, using the Softconf START conference management system. Submissions need to be anonymous. Submission site: https://softconf.com/naacl2025/cl4health2025 Dual submission policy: papers may NOT be submitted to the workshop if they are or will be concurrently submitted to another meeting or publication. MEETING The workshop will be hybrid. Virtual attendees must be registered for the workshop to access the online environment. Accepted papers will be presented as posters or oral presentations based on the reviewers’ recommendations. ORGANIZERS - Dina Demner-Fushman, US National Library of Medicine - Sophia Ananiadou, National Centre for Text Mining and University of Manchester, UK - Paul Thompson, National Centre for Text Mining and University of Manchester, UK - Deepak Gupta, US National Library of Medicine -- Paul Thompson Research Fellow Department of Computer Science National Centre for Text Mining Manchester Institute of Biotechnology University of Manchester 131 Princess Street Manchester M1 7DN UK http://personalpages.manchester.ac.uk/staff/Paul.Thompson/

1 0

PhD Position in NLP at the University of Marburg
by daniel.braun＠uni-marburg.de 08 Jan '25

08 Jan '25

Dear all, The newly established research group on Natural Language Processing at the University of Marburg is seeking applications for a position as Doctoral Researcher in one of the research areas of the group, which include: Methods and Applications of Natural Language Processing, Perspectivism and Disagreement in NLP, AI for Social Good, Legal Tech and NLP Evaluation. The position is offered for a period of 3 years. The starting date is as soon as possible. The position is fulltime with salary and benefits commensurate with a public service position in the state Hesse, Germany (TV-H E 13). Application deadline is the 19th of January. For more information and to apply please visit: https://stellenangebote.uni-marburg.de/jobposting/b26cbcb09d3e6c83dbdbab7def 555c7ec1843b040 Regards Daniel

1 0

PostDoc in NLP: Multilingual Coreference Resolution
by michael.strube＠h-its.org 08 Jan '25

08 Jan '25

Dear all, HITS is looking for a two-year Postdoctoral Researcher in Natural Language Processing (m/f/x) to perform research in multilingual coreference resolution. Application deadline: January 15th, 2025. Starting date (negotiable): March 1st, 2025. Please see for details https://www.h-its.org/hits-job/postdoctoral-researcher-in-natural-language-… If you have further questions please don't hesitate to contact Michael Strube at michael.strube(a)h-its.org. With best regards, Michael Strube -- Michael Strube NLP Group HITS gGmbH Schloss-Wolfsbrunnenweg 35 69118 Heidelberg, Germany http://www.h-its.org/nlp

1 0

[Job] Research Assistant: Knowledge-Graphs for Advanced Search Engines
by khalid.alkhatib＠rug.nl 08 Jan '25

08 Jan '25

We are seeking applications for a fully-funded one-year Research Assistant position in Computational Linguistics, focusing on developing Argumentation Knowledge Graphs for advanced search engines. The project aims to create structured, multi-perspective knowledge graphs to enhance search engines with reliable, balanced, and credible content, addressing challenges like information overload and misinformation. Conducted in collaboration with OpenWebSearch.EU, the project provides access to high-quality open data and enables integration into search interfaces, delivering trustworthy, diverse perspectives to support well-informed decision-making. https://www.rug.nl/about-ug/work-with-us/job-opportunities/?details=00347-0…

1 0

Deadline extended - TheWebConf 2025 workshop on Protecting Women Online
by Debora Nozza 08 Jan '25

08 Jan '25

[Apologies for cross-posting] ******************************************************************** CALL FOR PAPERS ACM TSWWW 2025 Towards a Safer Web for Women - First International Workshop on Protecting Women Online co-located with The Web Conference 2025 Sydney, Australia 28 April - 2 May 2025 https://tsww25.github.io/ ******************************************************************** EXTENDED DEADLINES (all deadlines are AoE) ******************************************************************** 21st January 2025 22nd December 2024: Workshop paper submission deadline 27th January 2025: Notification of acceptance ******************************************************************** SCOPE AND OVERVIEW __________________ The workshop is dedicated to addressing the pressing issue of online violence against women by fostering dialogue and innovation. The workshop will explore global challenges and solutions for gender-based violence and the impact of online harms on women, among others. We aim to encourage the development of technological and interdisciplinary frameworks and innovations to ensure women's online safety. The workshop aims to review progress in approaches combating online violence against women, identify persistent barriers, and propose solutions to emerging challenges. Topics of interest include, but are not limited to: * Detection and prevention of gender-based online violence (e.g., harassment, stalking, cyberbullying) * Sentiment and emotion analysis in abusive or harmful online interactions towards women * Gender bias identification and mitigation in AI * Human-centered approaches for online safety applications * Approaches to preventing, understanding, identifying and mitigating online harms faced by women with multiple marginalised identities (e.g., misogynoir, LGBTQ+ women, or women from religious or cultural minorities) * Analysis of tracking devices, surveillance tools, and hidden cameras misused against women * Detection and mitigation of non-consensual deepfake generation and dissemination * Interdisciplinary approaches to identifying and addressing online harm * Legal and ethical frameworks for protecting women online * Psychological, social, and legal impacts of online technology when used for gender-based abuse PAPER FORMAT AND SUBMISSION INSTRUCTIONS ________________________________________ We welcome both new and recent research, including non-archival submissions to showcase work published elsewhere, if it is especially relevant to the workshop's theme. Accepted formats include: * Long papers: Maximum 8 pages (excluding references) * Short papers: Maximum 4 pages (excluding references) * Position, idea, and emerging problem papers: Maximum 4 pages (excluding references) * Non-archival submissions: Up to 2 pages (excluding references) All papers should be submitted via Easychair: https://easychair.org/conferences/?conf=tsww25 For full details, visit our Call for Papers page. Further, at least one author of each accepted workshop paper has to register. Workshop attendance is only granted for registered participants. Accepted papers (except for non-archival submissions) will be included in the workshop proceedings, which will be published as companion proceedings of The Web Conference, and indexed according to the main conference policy. ORGANISING COMMITTEE ____________________ Workshop chairs: * Ángel Pavón Pérez, The Open University * Miriam Fernandez, The Open University * Tracie Farrell, The Open University * Debora Nozza, Bocconi University * Christine de Kock, University of Melbourne

1 0

Second call: Reading Concordances Training Day & Symposium @ FAU Erlangen-Nürnberg 19–21 March 2025
by Dykes, Nathan 08 Jan '25

08 Jan '25

The team of the Reading Concordances in the 21st Century project (RC21) is inviting you to two events on concordance reading that will take place at FAU in Erlangen, Germany. Both events are **free**, but places are limited, so please register early. Thu 20 & Fri 21 March 2025 – RC21 Symposium with poster session Join us for a two-day symposium with talks by guest speakers and the project team. The talks will cover methodology, theory, and applications of concordance reading. The symposium also includes a poster session, where participants can present their own research. Submission for the poster session is open until 31 January 2025. The following speakers will be presenting at the symposium: Laurence Anthony<https://www.dhss.phil.fau.eu/person/laurence-anthony-ph-d-2/>, Waseda University, Tokyo, Japan Nathan Dykes<https://www.dhss.phil.fau.eu/person/nathan-dykes/>, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany Stephanie Evert<https://www.linguistik.phil.fau.de/person/prof-dr-stephanie-evert/>, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany Ulrich Heid<https://www.uni-hildesheim.de/fb3/institute/iwist/mitglieder/heid/>, Hildesheim University, Germany Susan Hunston<https://www.birmingham.ac.uk/staff/profiles/elal/hunston-susan>, University of Birmingham, United Kingdom Marc Kupietz<https://www.ids-mannheim.de/digspra/personal/kupietz/>, Leibniz Institute for the German Language (IDS), Mannheim, Germany Michaela Mahlberg<https://michaelamahlberg.com/>, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany Alexander Piperski<https://www.linguistik.phil.fau.de/person/alexander-piperski/>, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany Patricia Ronan<https://islk.kuwi.tu-dortmund.de/ronan/>, Technical University Dortmund, Germany Charlotte Taylor<https://profiles.sussex.ac.uk/p329327-charlotte-taylor>, University of Sussex, England Yukio Tono<https://www.tufs.ac.jp/research/researcher/people/english/tono_yukio.html>, Tokyo University of Foreign Studies, Japan Valentin Werner<https://www.uni-bamberg.de/eng-ling/personen/werner/>, University of Bamberg, Germany Viola Wiegand<https://www.stir.ac.uk/people/1844219>, University of Stirling, UK Wed 19 March 2025 – RC21 Training Day The Concordance Reading Training Day, which precedes the symposium, will include hands-on sessions on different aspects of concordance reading. Participants need to bring their own devices for the event. Deadline for registration: 19 February 2025 Deadline for submission of poster abstracts: 31 January 2025 The events are part of the Reading Concordances in the 21st Century project jointly funded by the AHRC and the DFG. For more information and registration details: https://www.dhss.phil.fau.eu/reading-concordances-in-the-21st-century-rc21/… The RC21 team is looking forward to welcoming you to FAU! Stephanie Evert, Michaela Mahlberg, Nathan Dykes, Sasha Piperski

1 0