May 2025 - Corpora - ELRA lists

VinNLP - NLP Bridges Talk Series
by El-Haj, Mo 26 May '25

26 May '25

Dear colleagues, I’m happy to introduce NLP Bridges, a talk series launched by the VinNLP Group at VinUniversity, Hanoi, to connect global NLP experts with researchers, students, and communities working on low-resource languages. All talks are free, online, and recorded, with the aim of supporting inclusive and accessible NLP research. To learn more or register for upcoming talks, please visit: https://vinnlp.com/nlp_bridges.html Best regards, Mo ————— Dr Mo El-Haj Director of the VinNLP Research Group https://vinnlp.com/ Associate Professor (Reader) in NLP CECS, VinUniversity, Hanoi, Vietnam https://vinuni.edu.vn/people/mo-el-haj/ NLP Visiting Research Scholar SCC, Lancaster University, UK https://www.lancaster.ac.uk/dsi/about-us/members/mahmoud-el-haj Natural Language Engineering (NLE) Journal Editorial Board https://www.cambridge.org/core/journals/natural-language-engineering

1 0

Autumn School 2025 - Information Retrieval and Information Foraging in Dagstuhl - Registration extended
by Thomas Mandl 25 May '25

25 May '25

Registration is extended until end of May: *ASIRF 2025 - Autumn School 2025 - Information Retrieval and Information Foraging in Dagstuhl *https://fg-retrieval.gi.de/veranstaltung/asirf2025 Great learning opportunity for grad/PhD students and everyone interested in IR, NLP or related fields! ASIRF provides a holistic perspective and shows how Information Retrieval and Information Foraging interact. In addition to conceptual and methodological knowledge from both fields, ASIRF is highly interdisciplinary. The autumn school will provide different tutorials, from foundations of Information Retrieval and Information Foraging to recent research topics such as using large language models for information retrieval tasks or applications of IR techniques in application domains. The autumn school will take place at the best computer science venue ever Schloss Dagstuhl – Leibniz-Zentrum für Informatik (LZI) between 24 and 29 August 2025. The program includes the following lectures and tutorials: - Foundations of Interactive IR (Norbert Fuhr, U Duisburg-Essen) - Deep Learning for IR (Avishek Anand, TU Delft) - Conversational Search (Ralf Schenkel, U Trier) - Fairness and Bias (Andrea Horbach, U Kiel) - Multimodal retrieval and data analysis (Henning Müller, HES-SO) - User Experiments and Interactive IR (Thomas Mandl, U Hildesheim) - Complex Casual Leisure Information Needs (Vivien Petras, HU Berlin) - User Simulations for IR (Philipp Schaer, TH Köln) For more information on the schedule, program, or registration: https://fg-retrieval.gi.de/veranstaltung/asirf2025 See you in Dagstuhl! ASIRF 2025 organizers Philipp Schaer, Ralf Schenkel, and Thomas Mandl

1 0

HASOC-meme - Shared task @ FIRE 2025 - Hateful Memes in Bengali, Hindi, Gujarati and Bodo - Registration open
by Thomas Mandl 25 May '25

25 May '25

[Apologies for multiple postings] ************************************************************************************ [CFP] HASOC-meme: Hate Speech and Offensive Content Identification in Memes in Bengali, Hindi, Gujarati and Bodo at FIRE 2025 ************************************************************************************ https://hasocfire.github.io/hasoc/2025/call_for_participation.html <https://hasocfire.github.io/hasoc/2025/call_for_participation.html> We are excited to announce the 7th edition of HASOC, featuring a range of engaging shared tasks. We warmly invite you to participate in this edition. HASOC 2025 will introduce classification tasks on memes, focusing on the identification of abuse, sentiment, sarcasm, vulgarity, and target. The task will primarily include three binary classification tasks, one multi-class classification task, and one multi-label classification task on memes in Bangla, Hindi, Gujarati, and Bodo languages. Track Description: This task involves analyzing multimodal data (image and text) to detect abuse, identify targeted communities, assess vulgarity and sarcasm, and assign sentiment labels. So, the task will be in five parts. Sentiment detection: • Positive - The meme conveys a supportive, humorous, or appreciative tone. • Neutral - The meme is neither overtly positive nor negative in tone. • Negative - The meme expresses hostility, mockery, or criticism. Sarcasm Detection: • Sarcastic - The meme presents statements or visuals that imply the opposite of their literal meaning, often to mock or ridicule. • Non-Sarcastic: The meme directly conveys its message without sarcasm or irony. Vulgarity Detection: • Vulgar - The meme contains explicit or offensive words, gestures, or depictions. • Not Vulgar - The meme does not include any such content. Abuse Detection: • Abusive - The meme includes offensive, harmful, or derogatory language, imagery, or implications targeting an individual or a group. • Non-abusive - The meme does not contain any offensive, harmful, or derogatory content. Target Community Identification: • Gender - Any reference to male, female, non-binary, or transgender identities. • Religion - Mentions or imagery related to any religious belief, deity, or practice. • Individual - Specifically mentions or portrays a particular person. • Political - Targets political ideologies, parties, politicians, or policies. • National Origin - Targets people based on their country or ethnicity. • Social Sub-groups - Groups based on socio-economic status, occupation, cultural identity, or other affiliations. • Others - Any target that does not fall into the above categories. • None - If the meme does not target any specific community, no target label is assigned. Important dates * Registration starts: 15th May, 2025 * Hindi, Marathi and Bodo Training Data Release: 17th May, 2025 * Bangla Training data release: 24th May, 2025 * Release of the test set: 15th June, 2025 * Run submission deadline: 30th June, 2025 * Announcement of results: 15 July, 2025 * Working notes due: 30th August, 2025 * Camera-ready copies of notes and overview paper: 30th September, 2025 Task organizers * Prof. Dr. Thomas Mandl :- University of Hildesheim, Germany * Prof. Dr. Utpal Garain :-Indian Statistical Institute, India * Prof. Dr. Debasis Ganguly :- University of Glasgow, United Kingdom * Prof. Dr. Sandip Modha :- University of Milano-Bicocca, Italy & LDRP-ITR, Gandhinagar, India * Prof. Dr. Animesh Mukherjee :- Indian Institute of Technology, Khargapur, India * Dr. Koyel Ghosh :- University of Hildesheim, Germany * Dr. Mithun Das :- Indian Institute of Technology, Khargapur, India * Shubhankar Barman :- BITS pilani, India * Mwnthai Narzary :- Central Institute of Technology, Kokrajhar, India * Saptarshi Saha :- Indian Statistical Institute, Kolkata, India Website: https://hasocfire.github.io/hasoc/2025/call_for_participation.html <https://hasocfire.github.io/hasoc/2025/call_for_participation.html> In case of query, please send email via hasoc(a)googlegroups.com

1 0

CFP: The 1st Workshop on Large Language Models for Cross-Temporal Research at COLM 2025
by wei.zhao＠abdn.ac.uk 25 May '25

25 May '25

We invite you to submit your ongoing, published or pre-reviewed works to our workshop on Large Language Models for Cross-Temporal Research (XTempLLMs) at COLM 2025. Our workshop website is available at https://xtempllms.github.io/2025/ Workshop Description: Large language models (LLMs) have been used for a variety of time-sensitive applications such as temporal reasoning, forecasting and planning. In addition, there has been a growing number of interdisciplinary works that use LLMs for cross-temporal research in several domains, including social science, psychology, cognitive science, environmental science and clinical studies. However, LLMs are hindered in their understanding of time due to many different reasons, including temporal biases and knowledge conflicts in pretraining and RAG data but also a fundamental limitation in LLM tokenization that fragments a date into several meaningless subtokens. Such inadequate understanding of time would lead to inaccurate reasoning, forecasting and planning, and time-sensitive findings that are potentially misleading. Our workshop looks for (i) cross-temporal work in the NLP community and (ii) interdisciplinary work that relies on LLMs for cross-temporal studies. Cross-temporal work in the NLP community: * Novel benchmarks for evaluating the temporal abilities of LLMs across diverse date and time formats, culturally grounded time systems, and generalization to future contexts; * Novel methods (e.g., neuro-symbolic approaches) for developing temporally robust, unbiased, and reliable LLMs; * Data analysis such as the distribution of pretraining data over time and conflicting knowledge in pretraining and RAG data; * Interpretability regarding how temporal information is processed from tokenization to embedding across different layers, and finally to model output; * Temporal applications such as reasoning, forecasting and planning; * Consideration of cross-lingual and cross-cultural perspectives for linguistic and cultural inclusion over time. Interdisciplinary work that relies on LLMs for cross-temporal studies: * Time-sensitive discoveries, such as social biases over time and personality testing over time; * Assessment of time-sensitive discoveries to identify misleading findings if any; * Interdisciplinary evaluation benchmarks for LLMs’ temporal abilities, e.g., psychological time perception and episodic memory evaluation. Submission Modes: * Standard submissions: We invite the submission of papers that will receive up to three double-blind reviews from the XTempLLMs committee, and a final decision of acceptance from the workshop chairs. * Pre-reviewed submissions: We invite unpublished papers that have already been reviewed either through ACL ARR, or recent AACL/EACL/ACL/EMNLP/COLING venues. These papers will not receive new reviews but will be judged together with their reviews via a meta-review from the workshop chairs. * Published papers: We invite papers that have been published recently elsewhere to present at XTempLLMs. Please send the details of your paper (Paper title, authors, publication venue, abstract, and a link to download the paper) directly to xtempllms(a)gmail.com. This allows such papers to gain more visibility from the workshop audience. All deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”): * June 26, 2025: Submission deadline (standard and published papers) * July 18, 2025: Submission deadline for papers with ARR reviews * July 24, 2025: Notification of acceptance * October 10, 2025: Workshop day Invited Speakers: * Jose Camacho Collados, Cardiff University, United Kingdom * Ali Emami, Brock University, Canada * Alexis Huet, Huawei Technologies, France Organizing Committee: * Wei Zhao, University of Aberdeen, United Kingdom * Maxime Peyrard, Université Grenoble Alpes & CNRS, France * Katja Markert, Heidelberg University, Germany

1 0

Human-Computer Question Answering Competition: June 14 (in-person, DC) & June 21 (online)
by Jordan Boyd-Graber Ying 24 May '25

24 May '25

We’d like to invite you to take part in a machine learning / natural language competition with modest (~$500) prizes. We’re evaluating language models not just on their accuracy at answering questions but also on how well the models communicate their uncertainty not just quantitatively but also qualitatively. The models will be evaluated on accuracy, but this is not the primary metric. The primary metric will be how much the models help users do a better job of answering questions. In other words, a model that has 75% accuracy but is convincingly wrong on the remaining 25% of questions will fare worse than a model that has 66% accuracy but can correctly identify the remaining questions and say “I don’t know”, since the downstream humans will trust the latter model more. The human–computer games will be filmed and posted to YouTube so you can see how players reacted to your models. The system submission process is designed to be beginner-friendly and intuitive. There’s a version that is prompt-based, but also a huggingface upload for complete models. So if you’ve just wrapped up teaching an introductory NLP / AI course, we’d appreciate it if you pass this along! Full information here: https://sites.google.com/view/qanta/2025-competition Please contact qanta(a)googlegroups.com if you have any questions! Best, Jordan -- ---------------------------------------------------------- Jordan Boyd-Graber Professor University of Maryland: CS, iSchool, UMIACS, LSC Voice: 920.524.9464 jbg(a)umiacs.umd.edu http://boydgraber.org ----------------------------------------------------------

1 0

Open call for volume proposals - Phraseology and Multiword Expressions
by Lonneke van der Plas 23 May '25

23 May '25

Phraseology and Multiword Expressions (PMWE) (https://langsci-press.org/catalog/series/pmwe) is a book series at Language Science Press, a born-digital scholar-led open access publisher in linguistics. The series publishes high-quality books about conventionalized, idiosyncratic combinations of words. Within the field of phraseology such word combinations are sometimes called phrasemes, while the computational linguistics community uses the term multiword expressions for them. Various subtypes of such word combinations are of interest, such as multiword compounds, multiword terms, multiword named entities, light-verb constructions, phrasal verbs, idioms, collocations, formulaic speech, proverbs, etc. The series is open to different approaches to create a forum for an interdisciplinary and cross-framework exchange of research results, including but not limited to the following subdisciplines: - Computational linguistics and natural language processing - Computer science - Corpus linguistics - Lexicography - Psycholinguistics - Theoretical linguistics We welcome volume proposals addressing all topics related to theoretical, computational, and empirical approaches to phraseology including: - Linguistic properties and typologies of multiword expressions, especially in multilingual frameworks - Digital lexical resources including multiword expressions - Description and processing of multiword expressions in syntactic and semantic frameworks (e.g., CCG, CxG, HPSG, LFG, TAG, UD) - Identification and annotation of multiword expressions in corpora and treebanks - Multiword expressions in machine translation and other end-user applications - Multiword expressions and lexical innovation - Diachronic studies and semantic change in multiword expressions - Representation and evaluation of multiword expressions in language models (e.g., LLMs) and text generation systems All contributions should be in English. To submit a volume proposal, please follow the guidelines at the series home page: https://langsci-press.org/catalog/series/pmwe Volumes published so far: Voula Giouli, Verginica Barbu Mititelu (eds.) Multiword expressions in lexical resources: Linguistic, lexicographic, and computational perspectives. 2024. Victoria Beatrix Fendel (ed.) Support-verb constructions in the corpora of Greek: Between lexicon and grammar?. 2024. Aleksandar Trklja, Łukasz Grabowski (eds.) Formulaic language: Theories and methods. 2021 Sabine Schulte im Walde, Eva Smolka (eds.) The role of constituents in multiword expressions: An interdisciplinary, cross-lingual perspective. 2020. Yannick Parmentier, Jakub Waszczuk (eds.) Representation and parsing of multiword expressions: Current trends. 2019. Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze (eds.) Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop. 2018. Manfred Sailer, Stella Markantonatou (eds.) Multiword expressions: Insights from a multi-lingual perspective. 2018. (see https://langsci-press.org/catalog/series/pmwe for links to the volumes)

1 0

2025 ACL Conference Registration
by ACL Announcements 23 May '25

23 May '25

Dear Colleagues, The ACL 2025 Conference is pleased to announce that registration is now officially open. We encourage you to register early to take advantage of reduced rates. Please note the following important deadlines for registration: * Early Registration: Concludes on Wednesday, July 2, 2025, AOE. * Late Registration: Will close for both In-Person and Virtual attendees on Friday, July 25, 2025, at 11:59 PM CET. * Onsite Registration: Will be available for both In-Person and Virtual attendees from Saturday, July 26, 2025, through August 1, 2025, at 11:59 PM CET. Detailed information regarding the registration process can be found on the official conference website: https://acl.swoogo.com/acl2025 We look forward to welcoming you to ACL 2025 in beautiful Vienna! Sincerely, The ACL Organization Team

1 0

CfP 12th International Conference on CMC and Social Media Corpora for the Humanities
by Fábián-Trost, Annamária 23 May '25

23 May '25

Dear List, the 12th International Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora) will be held at the University of Bayreuth, Germany, on the 4th and 5th of September 2025 (CfP, extended deadline paper/abstract submission: 28th of May 2025, 23:59 CEST) The conference (https://www.cmc2025.uni-bayreuth.de/en/ ) brings together language-centered research on CMC and social media in linguistics, philologies, communication sciences, media, and social sciences with research questions from the fields of corpus and computational linguistics, language technology, text technology, and machine learning. We adhere to a wide definition of CMC and social media, covering various media of digital communication, including email, newsgroups, forums, chat and messenger applications (e.g. WhatsApp), social networks (e.g. Facebook, Instagram, X, TikTok), gaming platforms, as well as interactions in the communication areas of video portals (YouTube), learning platforms, gaming apps, online games and virtual worlds. Our keynotes are Gavin Brookes (Lancaster University) and Stephanie Evert (Friedrich-Alexander University Erlangen-Nuremberg). We invite submissions to the 12th conference on following topics (deadline paper/abstract submission: 28th of May 2025, 23:59 CEST) * Development of CMC corpora / social media corpora * Building CMC corpora: from data collection to publication * Open access data for CMC research: ethical and GDPR issues * Annotating CMC data: genres, linguistic aspects, metadata * Multimodal corpora * Big data corpora * Analysis of CMC corpora / social media corpora * Sociolinguistic studies of CMC * Discourse analysis of CMC * Linguistic characteristics of CMC * Multimodal (incl. visual) aspects of CMC * Multilingualism and code-switching in CMC * CMC in language education * Natural language processing (NLP) of CMC data / social media data * Normalization * PoS tagging * Lemmatization * Syntactic parsing * CMC for the benefit of digital societies * Interdisciplinary research design and research methods in CMC for the benefit of digital societies * Exploration of Diversity and Inclusion in CMC * Intersection of CMC and Social Sciences * Intersection of CMC and Human-Centered Data Science * Intersection of CMC and Computational Social Science * Contrastive CM studies across different languages The conference language is English. Submissions will consist of: * Short papers (2-4 pages – maximal 6 pages including the list of references –, following the existing template) for oral presentations * Abstracts (max. 300 words) for poster presentations Submission and review Authors of accepted papers are invited to present their work at the conference (30-minute timeslots: 20-minute talks, followed by 10 minutes of discussion). Authors of accepted abstracts can present their work in progress or early-stage research during the poster session. At the start of the conference, all accepted papers will be made available in online proceedings. After the conference, speakers with the best contributions will be invited to submit extended papers for one or more special issue journal or a volume publication. Instructions for authors All contributions will be collected through an online platform (ConfTool): https://www.cmc2025.uni-bayreuth.de/en/index.html Templates for the submission: Template for MSWord: https://www.cmc2025.uni-bayreuth.de/pool/dokumente/template_word.docx Template for LaTeX: https://www.cmc2025.uni-bayreuth.de/pool/dokumente/template_latex.zip Local organizing committee: * Dr. Annamaria Fabian (University of Bayreuth/Bavarian Research Institute for Digital Transformation at the Bavarian Academy of Science) * Prof. Dr. Igor Trost (Alpen-Adria University Klagenfurt/University of Passau) For all enquiries, please contact the organizers at cmc2025(a)uni-bayreuth.de<mailto:cmc2025@uni-bayreuth.de> and see https://www.cmc2025.uni-bayreuth.de/en/ More information on the “International Conference Series on CMC and Social Media Corpora (cmc-corpora)”: https://cmc-corpora.org/series/#<https://cmc-corpora.org/series/> All the best, Annamaria Fabian

1 1

Second Call for Participation Sim4IA@SIGIR 2025 - Micro Shared Task Data Available - Keynote announced
by Philipp Schaer 23 May '25

23 May '25

This is the second call for participation for the *2nd SIGIR 2025 Workshop on Simulations for Information Access (Sim4IA)*. The workshop will be held with SIGIR 2025 in Padua, Italy. It will provide a unique platform for researchers and practitioners to explore and discuss advancements in simulations for information access systems. ## tl;dr ---------- - 17 July 2025, co-located with SIGIR 2025 in Padua, Italy - Micro shared task data and framework available - Tech and infrastructure talks/presentations welcome - Keynote by Christine Bauer confirmed - We are on the ACM Slack: https://acmsigir.slack.com/archives/C08STM45N90 - Website: https://sim4ia.org/sigir2025/ ## Micro Shared Task Data and Framework Available ----------------------------------------------------------------------- To drive a more focused discussion at the workshop, we designed a micro shared task that demonstrates how a shared task in user simulations might look. On 16 May 2025, we released the first training data set as well as a prebundled and dockerized version of SimIIR to give everyone a head start on the shared task. Our shared task concept is based on the fundamental design principle of validating user simulations instead of measuring system effectiveness. We envision users interacting with a particular IA system, such as a traditional search engine (Task A) or a conversational system (Task B). We challenge participants to design and implement user simulators that can mimic the interactions of real users with these systems with a high degree of fidelity. The workshop features a stripped-down version of this concept, a micro shared task. We will discuss the submissions and ideas for the next steps or evaluation measures at the workshop. Non-binding expression of interest to take part in the micro shared tasks: https://forms.gle/ftV8cwjywHWsBhCw9 More information on the shared task, data sets, and framework: https://sim4ia.org/sigir2025/#micro-shared-task ## Keynote by Christine Bauer ----------------------------------------- We are happy to announce that Christine Bauer has confirmed to give a keynote on “From toy models to tactics: What user simulation is good for”. Christine Bauer is a Professor of Interactive Intelligent Systems at the Department of Artificial Intelligence and Human Interfaces (AIHI) at the University of Salzburg. She is involved in the EXDIGIT initiative, emphasizing interdisciplinary technologies in digital sciences. Her research lies at the intersection of human-centered computing, data science, and artificial intelligence, with a focus on context-aware recommender systems, particularly in the music and media domains. Her core interests include fairness and multi-method evaluation. Her multidisciplinary background drives her research activities. More information on the keynote: https://sim4ia.org/sigir2025/#keynote ## Invitation of Tech/Infrastructure Talks ----------------------------------------------------- We reserved a special time slot at the workshop for talks on recent technologies and/or infrastructures for (user) simulations, and invite you to submit your ideas for such talks at the workshop. Send a short email with your idea in the form of a title and roughly half a page of abstract to sigir2025(a)sim4ia.org Check out the tentative program, shared task data and description, the keynote announcement, and much more at https://sim4ia.org/sigir2025/ See you in Padua! Sim4IA Organizers Philipp Schaer, Christin Kreutz, Krisztian Balog, Timo Breuer, and Andreas Kruff

1 0

CfP Shifting Power in Language Learning and Applied Linguistics with GenAI - Hybrid UK/online, deadline May 31
by Rachele.De-Felice [She/her] 22 May '25

22 May '25

Dear all You are warmly invited to submit an abstract to the Shifting Power in Language Learning and Applied Linguistics with GenAI conference, which will take place in Milton Keynes, UK and online on November 13-14, 2025. This conference will explore how power is being shifted towards, away from, and between learners and educators by AI technologies, and the new dynamic and potential changes this is bringing about in applied linguistics, languages and cultures studies. Potential topics for the papers may include, but are not limited to: * AI and its impact on the training and evolving roles of languages and applied linguistics educators and their relationships with learners * AI and its potential to support inclusive and personalised learning in languages and applied linguistics; * AI integration into learning, teaching and assessment of languages, cultures and applied linguistics with a focus on ethical issues and sustainability challenges; * Core concepts and theoretical frameworks guiding the integration of AI in applied linguistics; * Core concepts and theoretical frameworks guiding the integration of AI in the learning and teaching of languages and cultures; * Questions around the use of AI in carrying out research in languages, cultures and applied linguistics, and its impact on research processes and outputs. Instructions for submission We welcome submissions in the following formats: * 20-minute presentations (online or in person) * 40-minute facilitated discussions with up to 3 facilitators (online or in person) Proposals should be submitted via email, by May 31st, 2025: ai-languages-conference(a)open.ac.uk <mailto:ai-languages-conference@open.ac.uk> The following information will be requested during the submission process: * Names, titles, contact info, institutional or organisational affiliation and short bio (max 100 words) for each presenter and facilitator * Conference topic (selected from the list above) * Session format (selected from the list above) * Title of the abstract * Abstract (max. 300 words) Kind regards Rachele Dr Rachele De Felice (she/her) | Lecturer in Applied Linguistics School of Languages and Applied Linguistics The Faculty of Wellbeing, Education and Language Studies The Open University https://profiles.open.ac.uk/rachele-de-felice

1 0

2026

2025

2024

2023

2022

Corpora May 2025