January 2025 - Corpora

Digital lexicography and lexical computing workshop, Bari, Italy
by Ondřej Matuška 19 Jan '26

19 Jan '26

*<Lexicom/>* a workshop in digital lexicography and lexical computing *Registration open* *Bari, Italy*15 – 19 September 2025 Your 5 days to get up-to-date with the latest developments in *corpus-driven lexicography* and to practice your *corpus building and corpus query skills* with some of the top experts in the field. For the programme, lecturers, invited speakers, fees and registration, visit this website *lexicom.courses <https://lexicom.courses/upcoming-lexicom/>* I hope to meet you in Bari in September! Ondřej *Ondřej Matuška* sketchengine.eu <http://www.sketchengine.eu/> | Facebook <https://www.facebook.com/SketchEngine/> | LinkedIn <https://www.linkedin.com/in/ondrejmatuska> | Twitter <https://twitter.com/SketchEngine>

1 1

Advance notice: ‘Statistics for linguistics with R’ bootcamp (08 – 12/07/2024)
by Magali Paquot 01 Dec '25

01 Dec '25

The Linguistics Research Unit of the Institute of Language and Communication (Université catholique de Louvain, Belgium) will be hosting Stefan Gries’s next bootcamp on statistics for linguistics with R from 08 to 12 July 2024. The ‘Statistics for linguistics with R’ bootcamp is a hands-on introduction to statistical methods for both graduate students and seasoned researchers and is loosely based on the third edition (2021) of Gries’s textbook Statistics for linguistics with R. The course is intended for linguists who already have a basic knowledge in statistics and some experience using R and who wish to improve their proficiency in statistical modeling of linguistic data. Using the open source software and programming language R, we will deal with: • fundamental aspects of fixed effects regression modeling for both numeric and binary response variables; these include exploration of data and their preparation for modeling, model formulation and selection; numerical and visual interpretation and evaluation of models; • more advanced aspects of fixed-effects regression modeling such as contrasts for ordinal predictors, orthogonal contrasts, curvature of numeric predictors, and maybe general linear hypothesis tests; • the theoretical foundations of mixed-effects regression modeling; • applications of mixed-effects modeling for both numeric and binary response variables; • tree-based methods and random forests: 'fitting' and interpreting them with importance scores, partial dependence scores, and detecting (not just capturing) interactions. The website of the bootcamp will be online in early 2024 and online registration will start on 1 March 2024, 11 am CEST. The number of participants is limited. If you would like to participate, mark the date in your diary! Contact email: magali.paquot(a)uclouvain.be<mailto:magali.paquot@uclouvain.be> Magali Paquot Convenor

1 2

3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France)
by Pascal Denis 13 May '25

13 May '25

Hello, Could you please distribute the following job offer? Thanks. Best, Pascal ------------------------------------------------------------------------------------- 3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France) We invite applications for a 3-year PhD position at the University of Lille in the context of the recently funded research project "COMANCHE" (Computational Models of Lexical Meaning and Change). The position is funded by Inria, the French national research institute in Computer Science and Applied Mathematics. COMANCHE proposes to transfer and adapt neural word embeddings algorithms to model the acquisition and evolution of word meaning, by comparing them with linguistic theories on language acquisition and language evolution. At the intersection between Natural Language Processing, psycholinguistics and historical linguistics, this project intends to validate or revise some of these theories, while also developing computational models that are less data hungry and computationally intensive as they exploit new inductive biases inspired by these disciplines. The first strand of the project, on which the successful candidate will work, focuses on the development of computational models of semantic memory and its acquisition. Two main research directions will be pursued. On the one hand, we will compare the structural properties associated to different semantic spaces derived from word embedding algorithms to those found in human semantic memory as reflected in behavioral data (such as typicality norms) as well as brain imaging data. The latter data will then used as additional supervision to inject more hierarchical structure into the learned semantic spaces. One the other hand, we intend to experiment with training regimes for word embedding algorithms that are closer to those of humans when they acquire language, controlling the quantity as well as the linguistic complexity of the inputs fed to the learning algorithms through the use of longitudinal and child directed speech corpora (e.g., CHILDES, Colaje). In both cases, both English and French data will be considered. The successful candidate holds a Master's degree in computational linguistics or computer science or cognitive science and has prior experience in word embedding models. Furthermore, the candidate will provide strong programming skills, expertise in machine learning approaches and is eager to work across languages. The position is affiliated with the MAGNET team at Inria, Lille [1] as well as with the SCALAB group at University of Lille [2] in an effort to strenghten collaborations between these two groups, and ultimately foster cross-fertilizations between Natural Language Processing and Psycholinguistics. Applications will be considered until the position is filled. However, you are encouraged to apply early as we shall start processing the applications as and when they are received. Applications, written in English or French, should include a brief cover letter with research interests and vision, a CV (including your contact address, work experience, publications), and contact information for at least 2 referees. Applications (and questions) should be sent to Angèle Brunellière (angele.brunelliere(a)univ-lille.fr) and Pascal Denis (pascal.denis(a)inria.fr). The starting date of the position is 1 October 2022 or soon thereafter, for a total of 3 full years. Best regards, Angèle Brunellière and Pascal Denis [1] https://team.inria.fr/magnet/ [2] https://scalab.univ-lille.fr/ -- Pascal ---- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria. ---- +++++++++++++++++++++++++++++++++++++++++++++++ Pascal Denis Equipe MAGNET, INRIA Lille Nord Europe Bâtiment B, Avenue Heloïse Parc scientifique de la Haute Borne 59650 Villeneuve d'Ascq Tel: ++33 3 59 35 87 24 Url: http://researchers.lille.inria.fr/~pdenis/ +++++++++++++++++++++++++++++++++++++++++++++++

1 2

First Call for Papers: Eighth Universal Dependencies Workshop
by Gosse Bouma 11 Apr '25

11 Apr '25

The eighth workshop on Universal Dependencies Part of SyntaxFest 2025, Ljubljana, August 26-29 Call for Papers Universal Dependencies (UD) is a framework for cross-linguistically consistent treebank annotation that has so far been applied to over 150 languages (https://universaldependencies.org <https://universaldependencies.org/>). The framework is aiming to capture similarities as well as idiosyncrasies among typologically different languages (e.g., morphologically rich languages, pro-drop languages, and languages featuring clitic doubling). The goal in developing UD was not only to support comparative evaluation and cross-lingual learning but also to facilitate multilingual natural language processing and enable comparative linguistic studies. The Universal Dependencies Workshop series was started to create a forum for discussion of the theory and practice of UD, its use in research and development, and its future goals and challenges. Some of the previous workshops have been co-located with Coling, EMNLP, and SyntaxFest. We invite papers on all topics relevant to UD, including but not limited to: * Theoretical foundations and universal guidelines * Linguistic analysis of specific languages and/or constructions * Language typology and linguistic universals * Treebank annotation, conversion and validation * Word segmentation, morphological tagging and syntactic parsing * The use of the UD data for evaluating or understanding language models * Linguistic studies based on the UD data Priority will be given to papers that adopt a cross-lingual perspective. SyntaxFest 2025 https://syntaxfest.github.io/syntaxfest25/index.html SyntaxFest is a biennial event that brings together a series of events focusing on topics such as empirical syntax, linguistic annotation, statistical language analysis, and natural language processing. Apart from the 8th UDW, it hosts TLT, DepLing, IWPT, and Quasy. Each workshop publishes its own proceedings, but all events follow a shared submission process, timeline, and programme. The UniDive 1st Shared Task on Morphosyntactic Parsing takes place on Aug, 26. Important Dates Paper submission DeadlineApril 15, 2025 Notification of acceptanceJune 2, 2025 Camera-ready version dueJune 16, 2025 Conference datesAugust 26-29, 2025 Submission Information Submission site and paper requirements will be provided in the next CfP Workshop Chairs Gosse Bouma (University of Groningen) Cagri Coltekin (University of Tübingen) -- Gosse Bouma, Communication and Information Science, Groningen University, P.o. box 716, 9700 AS Groningen G.Bouma(a)rug.nl tel. +31-50-3635937

2 2

2nd CFP - AAAS-2025
by Kurimo Mikko 07 Apr '25

07 Apr '25

[Apologies for cross-posting] == Second Call for Papers and Extended Abstracts == 1st Workshop on Automatic Assessment of Atypical Speech (AAAS-2025) We would like to invite you to submit papers to AAAS workshop co-located with NoDaLiDa/Baltic-HLT<https://www.nodalida-bhlt2025.eu> in Hestia Hotel Europa in Tallinn, Estonia on March 5th, 2025. Workshop website: https://teflon.aalto.fi/aaas-2025/ == Important Dates == Submission DL: 16 December 2024 (both papers and abstracts) Notification of acceptance: 24 January 2025 Camera-ready DL: 3 February 2025 Workshop: 5 March 2025 (full day) All deadlines are 11:55PM UTC-12:00 ("anywhere on Earth"). == Overview == Automatic Assessment of Atypical Speech (AAAS) explores the assessment of pronunciation and speaking skills of children, language learners, people with speech sound disorders and methods to provide automatic rating and feedback using automatic speech recognition (ASR) and large language models (LLMs). Automatic speaking assessment (ASA) is a rapidly growing field that answers to the need of developing AI tools for self-practising second and foreign language skills. This is not limited to pronunciation assessment, but the AI tools can also provide more complex feedback about fluency, vocabulary and grammar of the recorded speech. ASA is also very relevant for detection and quantification of speech disorders and for developing speech exercises that can be performed independent of time and place. The important applications of non-standard speech also include interfaces for children and elderly speakers as an alternative to using text input and output. The topic is timely, because the latest large speech models allow us now to develop ASR and classification methods for low-resourced data, such as atypical speech, where annotated training datasets are rarely available and expensive and difficult to produce and share. The goal of this workshop is to present the latest results in ASA and discuss the future work and collaboration between the researchers in Nordic and Baltic countries. == Topics of Interest == In particular, we would like to invite students, researchers, and other experts and stakeholders to contribute papers and/or join the discussion on the following (and related) topics: Automatic speaking assessment (ASA) for L2 (second or foreign language) pronunciation ASA for spoken L2 proficiency ASA for speech sound disorders (SSD) Automatic speech recognition (ASR) for L2 learners ASR for children and young L2 learners ASA and ASR for Nordic and other low-resource languages and tasks Spoken L2 learning and speech therapy using games Automatic generation of verbal feedback for spoken L2 learners using LLMs == Submission Details == We accept both short and long papers, as well as demo papers. The submissions must describe original and unpublished work. Paper length: Short and demo papers up to 4 pages. Long papers up to 8 pages. References are not included in the page count, and the camera-ready versions of accepted papers will be added to the page to address reviewer comments. Papers should describe original unpublished work or work-in-progress and will be peer-reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be published in the ACL anthology. All submissions must follow the NoDaLida template, available in both LaTeX and MS Word. The links to the templates can be found here: https://drive.google.com/file/d/1osWGzuRnYRQGRS70Lx_pdQKrIT-NefKS/view https://www.overleaf.com/latex/templates/instructions-for-nodalida-baltic-h… The submission will be through EasyChair: https://easychair.org/conferences/?conf=aaas2025 We also invite submissions of maximum 2-page long extended non-anonymous abstracts with any number of pages for references describing work in progress, negative results and opinion pieces. The abstracts, which should follow the same formatting templates as the peer-reviewed papers, will be considered for presentation by the workshop organisers and the accepted ones will be posted on the workshop website. The abstracts can be based on results related to our theme and already published elsewhere. The abstracts will not be published in the proceedings, but only in the workshop program. Please also consider volunteering to review 2-3 papers. == Invited Speakers == We have the pleasure to announce two invited speakers: 1. Nina R. Benway: What is so hard about AI Speech Therapy? Evidence from Efficacy Trials. Nina R Benway, PhD CCC-SLP, is a Postdoctoral Fellow in Electrical and Computer Engineering with Dr. Carol Espy-Wilson. Nina completed her doctoral training in speech-language pathology (concentration: neuroscience) with Dr. Jonathan Preston at Syracuse University, focusing on clinical trials in children with chronic rhotic speech sound disorders. The three studies of her dissertation resulted in the curation of an open-access 175,000-utterance speech corpus, the engineering of audio classification algorithms predicting speech-language pathologist perception of rhotic speech errors, and the clinical trial validation of an artificial intelligence tool that fully automates a speech sound treatment session. Nina’s doctoral training builds upon her undergraduate training in linguistics (acoustic phonetics) at Cornell University, graduate clinical training at The College of Saint Rose, and six years of clinical practice. Through these experiences Nina has refined a multidisciplinary skill set in speech science, speech signal processing, natural language processing, corpus phonetics, machine learning/artificial intelligence (AI), user interface development, cognitive frameworks of learning, and neurocomputational frameworks of speech production. 2. Ari Huhta: Automatic assessment of second/foreign language speaking: Review of developments for examination and teaching/learning purposes. Ari Huhta is a Professor of Language Assessment at the Centre for Applied Language Studies, University of Jyväskylä, Finland. His research interests include diagnostic foreign/second language (L2) assessment, computerised assessment, self-assessment, as well as the development of reading, writing and vocabulary knowledge in L2. He was involved in developing the large-scale multilingual DIALANG online assessment and feedback system in the early 2000s and since then he has specialised in assessments that support language learning. Although his research has focused on learning and assessing reading and writing, he has been involved in designing several rating scales for speaking and in evaluating rating quality and studying rater behavior. Recently, he has participated in research projects that are developing ASR and automated assessment of L2 speaking, as well as using LLMs to evaluate Finnish L2 learners’ proficiency level. == Organizers == Mikko Kurimo (chair), Aalto University, mikko.kurimo(a)aalto.fi Giampiero Salvi, NTNU Sofia Strömbergsson, Karolinska Institutet Sari Ylinen, Tampere University Minna Lehtonen, University of Turku Tamas Grosz, Aalto University Ekaterina Voskoboinik, Aalto University Yaroslav Getman, Aalto University Nhan Phan, Aalto University This workshop is supported by “Technology-enhanced foreign and second-language learning of Nordic languages (TEFLON)” https://teflon.aalto.fi/ NordForsk project nr. 103893. == Contact Information == For questions and comments, please email mikko.kurimo(a)aalto.fi

1 3

2nd CFP: The 19th Linguistic Annotation Workshop (LAW-XIX)
by Ines Rehbein 01 Apr '25

01 Apr '25

Call for Papers: The 19th Linguistic Annotation Workshop (LAW-XIX) We invite submissions for LAW-XIX, co-located with ACL 2025 in Vienna, Austria, in July/Aug 2025. The LAW-XIX will provide a forum for presentation and discussion of innovative research on all aspects of linguistic annotation, including creation/evaluation of annotation schemes, methods for automatic and manual annotation, use and evaluation of annotation software and frameworks, representation of linguistic data and annotations, semi-supervised “human in the loop” methods of annotation, crowd-sourcing approaches, and more. Special Theme The special theme of LAW-XIX is "*Subjectivity and variation in linguistic annotations*". In addition to LAW's general topics, we specifically invite submissions on: * Subjectivity and human label variation in linguistic annotations * Learning from annotation disagreements * Detecting annotation noise in human label variation * Accounting for subjectivity in label aggregation * Ways to aggregate multiple annotators' labels beyond majority vote * Any other topics related to the special theme. Regarding subjectivity, we are particularly interested in work addressing the*annotation of multidimensional constructs from the political and social sciences* and encourage submissions on the following topics: * Theory-driven operationalization of complex political or socio-psychological constructs, * such as populism, moral values, or stereotypes Creation of linguistically annotated datasets that capture such constructs * Relation between theories and textual annotations * Challenges for the measurement of multidimensional constructs from text * Challenges for validating (a) theories, (b) annotations * Implications and risks for manual annotation and automatic prediction of socio-psychological constructs from text. Important Dates All submission deadlines are 11:59 p.m. UTC-12:00 “anywhere on Earth.” Workshop papers due (ARR Commitment) Mar 25, 2025 Workshop papers due (Direct Submission) April 04, 2025 Notification of acceptance May 16, 2025 Camera-ready papers due May 30, 2025 Workshop date July/Aug, 2025 Submissions Please submit your paper here: https://softconf.com/acl2025/law2025 For more information on the workshop and submission formats, please refer to the workshop homepage: https://sigann.github.io/LAW-XIX-2025 If you have any questions, please feel free to contact the program co-chairs at law2025workshop(a)gmail.com. Workshop Organizers Siyao (Logan) Peng (Program Co-Chair) Ines Rehbein (Program Co-Chair) Amir Zeldes (ACL SIGANN President) -- Ines Rehbein Data and Web Science Group University of Mannheim, Germany

1 2

Text2Story'25@ECIR: Deadline Extension & Last Call for Papers
by Hugo Sousa 31 Jan '25

31 Jan '25

*** Apologies for cross-posting *** ++ DEADLINE EXTENSION & LAST CALL FOR PAPERS ++ **************************************************************************** Eighth International Workshop on Narrative Extraction from Texts (Text2Story'25) Held in conjunction with the 47th European Conference on Information Retrieval (ECIR'25) April 10th, 2025 – Lucca, Italy Website: https://text2story25.inesctec.pt **************************************************************************** ++ Important Dates ++ - Submission Deadline: February 7th, 2025 January 24th, 2025 - Acceptance Notification: March 3rd, 2025 - Camera-ready copies: March 17th, 2025 - Workshop: April 10th, 2025 ++ Overview ++ For seven years, the Text2Story Workshop series has fostered a vibrant community dedicated to understanding narrative structure in text, resulting in significant contributions to the field and developing a shared understanding of the challenges in this domain. While traditional methods have yielded valuable insights, the advent of Transformers and LLMs have ignited a new wave of interest in narrative understanding. In the eighth edition of the Text2Story workshop, we propose to go deeper into the role of LLMs in narrative understanding exploring the issues involved in using LLMs to unravel narrative structures, while also examining the characteristics of narratives generated by LLMs. By fostering dialogue on these emerging areas, we aim to identify the wide-ranging issues related to the narrative extraction task and continue the workshop's tradition of driving innovation in narrative understanding research. ++ List of Topics ++ Research works submitted to the workshop should advance the scientific understanding of all aspects of narrative extraction from texts. This includes, but is not limited to, topics such as narrative information extraction, formal representation of narratives, narrative analysis and generation, development of datasets and evaluation protocols, as well as ethics and bias in narratives, and narrative applications. We encourage the submission of high-quality and original submissions covering the following topics and contributions focused on low and medium-resource languages. Narrative Information Extraction - Identification of Participants, Events and Temporal Expressions - Identification of Participants, Events and Temporal Expressions - Temporal Reasoning and Ordering of Events - Causality Detection - Big Data Applied to Narrative Extraction - LLMs for Narrative Extraction Narrative Representation - Annotation Protocols - Narrative Representation Models - Lexical, Syntactic, and Semantic Ambiguity in Narrative Representation - LLM-learned Representation Narrative Analysis and Generation - Discourse and Argument Structure Analysis - Narrative analysis of LLM generated text - Multilingual and Cross-lingual Narrative Analysis - Story Evolution and Shift Detection - Automatic Timeline Generation - Generative Language Models for Narrative Generation Datasets and Evaluation Protocol - Evaluating LLM-Generated Narratives - Evaluation of Multimodal Narrative Models - Annotated datasets - Narrative Resources - Using LLMs for Data Creation and Augmentation Ethics and Bias in Narratives - Identifying and Mitigating Bias in Generated Narratives - Ethical and Fair Narrative Generation - Misinformation and Fact Checking - Bias in LLM-generated narratives Narrative Applications - Narrative-focused Search in Text Collections - Narrative Summarization - Narrative Q&A - Multimodal Narrative Summarization - Multimodal Narrative-focused Search - Sentiment and Opinion Detection in Narratives - Social Media Narratives - Narrative Text Simplification - Narrative-based Text Anonymization - Personalization and Recommendation of Narratives - Storyline Visualization (including multimodal) and Narrative Structures ++ Objectives ++ Overall, the workshop has the following main objectives: (1) raise awareness within the Information Retrieval (IR) community regarding the challenges posed by narrative extraction and comprehension; (2) bridge the gap and foster connections between academic research, practitioners, and industrial applications; (3) discuss new methods, recent advances, and emerging challenges; (4) share experiences from research projects, case studies, and scientific outcomes structured around fundamental research questions related to narrative understanding; (5) identify dimensions that might be influenced by the automation of the narrative process; (6) highlight tested hypotheses that did not result in the expected outcomes ++ Submission Guidelines ++ We expect contributions from researchers on all aspects of narrative extraction, representation, analysis, and generation. This includes the extraction and formal representation of events, their temporal and causal relationships, and methods for temporal reasoning and ordering. Submissions focusing on narrative comprehension, such as the analysis of generated narratives, are also highly encouraged. Additionally, we welcome innovative approaches to presenting narrative information, including automatic timeline generation, multi-modal narrative summarization, and narrative visualization. Research addressing misinformation and the verification of extracted facts, evaluation methodologies, and the development of annotated datasets, annotation schemas, and evaluation metrics is particularly valued. Finally, we are especially interested in submissions that focus on low and medium-resource languages, as well as multilingual and cross-lingual narrative analysis. Building on these themes, several pressing questions emerge within the field, offering valuable guidance for authors in shaping their submissions.How can we better integrate multimodal content - combining text, images, videos, and audio - into cohesive narratives? What strategies can reliably extract or generate accurate narratives from large, multi-genre, and multi-lingual datasets? How can systems dynamically adapt to real-time shifts in narratives as the volume of generated content grows? What methodologies can effectively annotate data and evaluate novel approaches, for complex tasks such as visualization but also for characterization of multi-lingual narratives? How can we guarantee the explainability, interpretability, and coherence of narratives across diverse domains and languages? To what extent can novel approaches be generalized to new tasks, genres, and languages with minimal effort? What ethical safeguards are essential to ensure that narrative extraction systems are not misused for propaganda or manipulation? How can challenges posed by ambiguous or contradictory information within narratives be addressed through innovative methods? What role do cultural and contextual nuances play in narrative extraction, and how can these be effectively incorporated into automated systems to ensure greater inclusivity? How can collaboration between human annotators and automated systems be optimized to achieve more accurate, nuanced narrative understanding? How can systems generate concise, evidence-backed explanations to justify the dominant narrative while remaining grounded in the source text? -> Full papers (up to 8 pages + references): Original and high-quality unpublished contributions to the theory and practical aspects of the narrative extraction task. Full papers should introduce existing approaches, describe the methodology and the experiments conducted in detail. Negative result papers to highlight tested hypotheses that did not get the expected outcome are also welcomed. -> Short papers (up to 5 pages + references): Unpublished short papers describing work in progress; position papers introducing a new point of view, a research vision or a reasoned opinion on the workshop topics; and dissemination papers describing project ideas, ongoing research lines, case studies or summarized versions of previously published papers in high-quality conferences/journals that is worthwhile sharing with the Text2Story community, but where novelty is not a fundamental issue. -> Demos | Resource Papers (up to 5 pages + references): Unpublished papers presenting research/industrial demos; papers describing important resources (datasets or software packages) to the text2story community; Papers submitted to Text2Story 2025 should be original work and different from papers that have been previously published, accepted for publication, or that are under review at other venues. Exceptions to this rule are "dissemination papers". Pre-prints submitted to ArXiv are eligible. All papers will be refereed through a double-blind peer-review process by at least two members of the programme committee. The accepted papers will appear in the proceedings published at CEUR workshop proceedings (indexed in Scopus and DBLP) as long as they don't conflict with previous publication rights. ++ Invited Speakers ++ Sara Tonelli, Fondazione Bruno Kessler, Italy Title: Revisiting frames for event extraction in the Digital Humanities Abstract: Frame Semantics as a cognitive linguistic theory was first formalised by Charles Fillmore around 50 years ago. Since then, it has been adapted to different application scenarios as a framework to support event-based information extraction. But what is the role of frames in the era of generative AI? In this talk I will present some recent research works in which frame semantics has been tailored to support digital humanities research. In particular, we explored the use of frames to extract sensory information from historical archives and capture shifts in perception over time. Frame-based event extraction has also been investigated as a way to navigate news collections, build narratives from event chains and present the same event from different points of view. Bio: Sara Tonelli is the head of the Digital Humanities research group at Fondazione Bruno Kessler, Trento (Italy) and holds a Phd in Language Sciences from Università Ca' Foscari, Venice. Between 2021 and 2024 she served as Liaison Representative of the ACL Special Interest Group on Language Technologies for the Socio-Economic Sciences and Humanities (SIGHUM) and she is currently part of the board of the Italian Association for Computational Linguistics (AILC). In the last years, she has served as area chair and senior area chair for major *ACL conferences in tracks related to cultural analytics, social media analysis, digital humanities and offensive language detection. She has also participated in different EU-funded projects around disinformation, computational social science and cultural heritage and was scientific coordinator of the KID ACTIONS European project (2021-2022), aimed at addressing cyberbullying among children and adolescents through interactive education and gamification. Her research interests focus on understanding how people communicate on social media and what dynamics are involved in online attacks, as well as what kind of biases can affect this analysis. She is also interested in using NLP to extract information from digital archives to address historical and cultural heritage research questions. ++ Organizing committee ++ Ricardo Campos (INESC TEC; University of Beira Interior, Covilhã, Portugal) Alípio M. Jorge (INESC TEC; University of Porto, Portugal) Adam Jatowt (University of Innsbruck, Austria) Sumit Bhatia (Media and Data Science Research Lab, Adobe) Marina Litvak (Shamoon Academic College of Engineering, Israel) ++ Proceedings Chair ++ João Paulo Cordeiro (NOVA Lincs & University of Beira Interior, Covilhã, Portugal) Conceição Rocha (INESC TEC, Portugal) ++ Web and Dissemination Chair ++ Hugo Sousa (INESC TEC & University of Porto, Portugal) Behrooz Mansouri (University of Maine, USA) ++ Program Committee ++ Abhai Singh (Amazon) Ali Salehi (University at Buffalo) Arian Pasquali (Faktion AI) Andreas Spitz (University of Konstanz) Antoine Doucet (Université de La Rochelle) António Horta Branco (University of Lisbon) Bart Gajderowicz (University of Toronto) Behrooz Mansouri (Rochester Institute of Technology) Brenda Santana (Federal University of Rio Grande do Sul) Brucce dos Santos (Computational Intelligence Laboratory (LABIC) - ICMC/USP) Bruno Martins (IST & INESC-ID, University of Lisbon) David Semedo (Universidade NOVA de Lisboa) Dennis Aumiller (Cohere) Dhruv Gupta (Norwegian University of Science and Technology) Evelin Amorim (INESC TEC) Sérgio Matos (University of Aveiro) Florian Boudin (Nantes University) Henrique Lopes Cardoso (LIACC & University of Porto) Irina Rabaev (Shamoon College of Engineering) Ismail Altingovde (Middle East Technical University) Junbo Huang (University of Hamburg) Jakub Piskorski (Polish Academy of Sciences) João Paulo Cordeiro (Nova lincs & University of Beira Interior) Jin Zhao (Brandeis University) Luca Cagliero (Politecnico di Torino) Ludovic Moncla (INSA Lyon) Luis Filipe Cunha (INESC TEC & University of Minho) Marc Finlayson (Florida International University) Marc Spaniol (Université de Caen Normandie) Moreno La Quatra (Kore University of Enna) Nianwen Xue (Brandeis University) Nuno Guimarães (INESC TEC & University of Porto) Paulo Quaresma (Universidade de Évora) Paul Rayson (Lancaster University) Purificação Silvano (CLUP & University of Porto) Ross Purves (University of Zurich) Sérgio Nunes (INESC TEC & University of Porto) Sriharsh Bhyravajjula (University of Washington) Udo Kruschwitz (University of Regensburg) Valentina Bartalesi (ISTI-CNR, Italy) Yangyang Chen (Brandeis University) ++ Contacts ++ Website: https://text2story25.inesctec.pt For general inquiries regarding the workshop, reach the organizers at: text2story2025(a)easychair.org

1 0

Webminar by Sebastian Ruder (Meta)
by HiTZ zentroa 31 Jan '25

31 Jan '25

**** We apologize for the multiple copies of this email. In case you are already registered to the next webinar, you do not need to register again. **** ------------------------------------------------------------------------ Dear colleague, We are happy to announce the next webinar in the Language Technology webinar series organized by the HiTZ Chair of AI&LT (https://hitz.eus). You can check the videos of previous webinars and the schedule for upcoming webinars here: http://www.hitz.eus/webinars Next webinar: *Speaker:* Sebastian Ruder (Meta) *Title:* Multilingual LLM Evaluation in Practical Settings *Date: * Thursday, February 6, 2025 - 15:00 CET *Summary:* Large language models (LLMs) are increasingly used in a variety of applications across the globe but do not provide equal utility across languages. In this talk, I will discuss multilingual evaluation of LLMs in two practical settings: conversational instruction-following and usage of quantized models. For the first part, I will focus on a specific aspect of multilingual conversational ability where errors result in a jarring user experience: generating text in the user’s desired language. I will describe a new benchmark and evaluation of a range of LLMs. We find that even the strongest models exhibit language confusion, i.e., they fail to consistently respond in the correct language. I will discuss what affects language confusion, how to mitigate it, and potential extensions. In the second part, I will discuss the first evaluation study of quantized multilingual LLMs across languages. We find that automatic metrics severely underestimate the negative impact of quantization and that human evaluation—which has been neglected by prior studies—is key to revealing harmful effects. Overall, I highlight limitations of multilingual LLMs and challenges of real-world multilingual evaluation. *Bio:* Sebastian Ruder is a research scientist at Meta based in Berlin, Germany where he works on improving evaluation and benchmarking of large language models (LLMs). He previously led the Multilinguality team at Cohere with the objective to improve the multilingual capabilities of Cohere's LLMs. Before that he was a research scientist at Google DeepMind. He completed his PhD in Natural Language Processing (NLP) at the Insight Research Centre for Data Analytics, while working as a research scientist at Dublin-based text analytics startup AYLIEN. Previously, he studied Computational Linguistics at the University of Heidelberg, Germany and at Trinity College, Dublin. * Upcoming webinars:* · Christian Herff (Thursday, March 6, 2025) · Emanuele Bugliarello (Thursday, April 3, 2025) · André F. T. Martins (Thursday, May 8, 2025) If you are interested in participating, please complete this registration form: http://www.hitz.eus/webinar_izenematea If you cannot attend this seminar, but you want to be informed of the following HiTZ webinars, please complete this registration form instead: http://www.hitz.eus/webinar_info Best wishes, HiTZ Zentroa P.S: HiTZ will not grant any type of certificate for attendance at these webinars.

1 0

Deadline Extension: Workshop on Insights from Negative Results in NLP – Submit by February 10!
by raphael＠uaca.com 30 Jan '25

30 Jan '25

Hi, We are pleased to announce that the submission deadline for the Insights 2025 Workshop (May 3-4, 2025, co-located with NAACL 2025) has been extended to February 10, 2025 (AoE). If you are working on unexpected or negative results in NLP research, we encourage you to submit your work. This workshop provides a venue for highlighting methodological challenges, limitations of current approaches, and insights that can guide the community toward more rigorous research practices. * New Submission Deadline: February 10, 2025 (AoE) * Submission Portal: https://softconf.com/naacl2025/Insights2025 We invite: - Short papers (up to 4 pages + references & appendices) - 1-2 page non-archival abstracts for work published elsewhere For full details, please refer to https://insights-workshop.github.io/2025/cfp/ We appreciate your contributions and look forward to your submissions! Best regards, Insights 2025 Organizing Committee

1 0

3rd and Final Call for Abstracts -- NARNiHS Research Incubator
by Lauersdorf, Mark R. 30 Jan '25

30 Jan '25

3rd and FINAL Call for Abstracts! Never been to a NARNiHS Research Incubator?!? Take advantage of the newly extended abstract submission deadline to join us for this year's opportunity to brainstorm your cutting-edge work with us! *********************************** 2025 NARNiHS Research Incubator North American Research Network in Historical Sociolinguistics 7th edition *********************************** ==> 01-03 May 2025 -- entirely online! ==> FINAL Submission Deadline ==> 03 February 2025, 11:59 PM (U.S. Eastern Time) The 2025 NARNiHS Research Incubator is an entirely online event (with **free** registration). This event offers an opportunity for scholars in historical sociolinguistics from all over the world to participate in cutting edge research without the limitations imposed by international travel. We encourage our fellow historical sociolinguists and scholars from related fields in our global scholarly community to join us online for our Research Incubator this spring. FINAL abstract submission deadline: 03 February 2025, 11:59 PM (U.S. Eastern Time) Abstract submission online: https://easyabs.linguistlist.org/conference/25_NARNiHS_Incubator/ The North American Research Network in Historical Sociolinguistics (NARNiHS) is accepting abstracts for its 2025 NARNiHS Research Incubator. The 7th edition of this inclusive NARNiHS event seeks to provide a collaborative environment where presenters bring work that is in-progress, exploratory, proof-of-concept, or prototyping. The incubator's audience actively participates in workshopping these new ideas, brainstorming along with the presenter to forge scholarly paths and develop research solutions. We see the NARNiHS Research Incubator as a place for testing and pushing boundaries; developing new theories, methods, models, and tools in historical sociolinguistics; seeking feedback from peers; and engaging in productive assessment of fledgling ideas and nascent projects. Successful abstracts for this research incubator environment will demonstrate thorough grounding in historical sociolinguistics, scientific rigor in the formulation of research questions, and promise for rich discussion of ideas. NARNiHS welcomes papers in all areas of historical sociolinguistics, which is understood as the application/development of sociolinguistic theories, methods, and models for the study of historical language variation and change over time, or more broadly, the study of the interaction of language and society in historical periods and from historical perspectives. Thus, a wide range of linguistic areas, subdisciplines, and methodologies easily find their place within the field, and we encourage submission of abstracts that reflect this broad scope. We are soliciting abstracts for **25-minute presentations**. Presenters will have the entire 25 minutes for their presentations, with discussion happening in the "incubation session" at the end of each panel. Abstracts should be **no more than one page** (not including examples and references, see below). Abstracts will be accepted until 03 February 2025 -- late abstracts will not be considered. Successful abstracts will be explicit about which theoretical frameworks, methodological protocols, and analytical strategies are being applied or critiqued. Data sources and examples should be sufficiently (if briefly) presented, so as to allow reviewers a full understanding of the scope and claims of the research. Please note that **the connection of your research to the field of historical sociolinguistics should be explicitly outlined** in your abstract. Failure to adhere to these criteria will likely result in rejection of the abstract. To encourage maximum exchange of ideas in the incubation environment, an hour-long discussion with the audience -- led by specialists -- will follow each thematic panel and will encompass specific feedback on three papers as well as emergent considerations of overarching questions of theory, methods, and models. To facilitate such incubation, authors will be required to submit a draft of their presentation materials for distribution to the panel discussants and the other presenters a few days prior to the start of the conference. Abstract Content Requirements: 1) Abstracts should be explicit about which theoretical frameworks, methodological protocols, and analytical strategies are being applied or critiqued. 2) Data sources and examples should be sufficiently (if briefly) presented, so as to allow reviewers a full understanding of the scope and claims of the research. 3) The connection of your research to the field of historical sociolinguistics should be explicitly outlined. Abstract Format Guidelines: 1) Abstracts must be submitted in PDF format. 2) Abstracts must fit on one standard 8.5x11 inch page, with margins no smaller than 1 inch and a font style and size no smaller than Times New Roman 12 point. All additional content (visualizations, trees, tables, figures, captions, examples, and references) must fit on a single (1) additional page. No exceptions to these requirements are allowed; abstracts exceeding these limits will be rejected without review. 3) Anonymize your abstract. We realize that sometimes complete anonymity is not attainable, but there is a difference between the nature of the research creating an inability to anonymize and careless non-anonymizing (in citations, references, file names, etc.). Be sure to anonymize your PDF file (you may do so in Adobe Acrobat Reader by clicking on "File", then "Properties", removing your name if it appears in the "Author" line of the "Description" tab, and re-saving the file before submission). Do not use your name when saving your PDF (e.g. Smith_Abstract.pdf); file names will not be automatically anonymized by the EasyAbs system. Rather, use non-identifying information in your file name (e.g. HistSoc4Lyfe.pdf). Your name should only appear in the online form accompanying your abstract submission. Papers that are not sufficiently anonymized wherever possible will be rejected without review. General Conference Requirements: 1) Abstracts must be submitted electronically, using the following link: https://easyabs.linguistlist.org/conference/25_NARNiHS_Incubator/ 2) Papers must be delivered as projected in the abstract or represent bona fide developments of the same research. 3) Authors are expected to virtually attend the conference and present their own papers. 4) Presentations will be delivered via Zoom. Technical details and instructions regarding the platform will be sent to authors in due time. Please contact us at NARNiHistSoc(a)gmail.com with any questions.

1 0

2026

2025

2024

2023

2022

Corpora January 2025