- Corpora - ELRA lists

SIGIR 2025 - Call for Workshop Proposals
by Surya Kallumadi 08 Dec '24

08 Dec '24

Call for Workshop Proposals: https://sigir2025.dei.unipd.it/call-workshops.html The annual SIGIR conference is the major international forum for the presentation of new research results and the demonstration of new systems and techniques in the broad field of information retrieval (IR). The 48th ACM SIGIR conference will be held in person in Padua, Italy, from July 13th to 18th, 2025. SIGIR 2025 workshops will provide a platform for presenting novel ideas and emerging areas in IR, in a less formal and more focused way than the conference itself. Researchers and practitioners from all areas of IR are invited to submit workshop proposals for review. 2025 Format Workshops will be on-site and in person. All workshops will require at least one organizer to attend, as well as a (yet-to-be-determined) number of participants. More information at the SIGIR 2025 In-presence Policy <https://sigir2025.dei.unipd.it/> page. Important Dates for Workshop Proposals Time zone: Anywhere on Earth (AoE) - Proposal submission: January 9, 2025 - Proposal acceptance notification: February 6, 2025 - Individual workshop paper submission: April 23, 2025 - Tentative workshop overview camera ready: April 24, 2025 - Tentative workshop paper acceptance notification: May 21, 2025 - Workshop day: July 17, 2025 Topics of Interest Workshop topics typically match those identified in the SIGIR 2025 general call for contributions <https://sigir2025.dei.unipd.it/call-full-papers.html> (see, e.g., full papers); proposals on other topics related to IR are welcome. We encourage prospective workshop organizers to submit proposals for highly interactive workshops (either full-day or half-day) focusing on either in-depth analysis or broad-ranging approaches to information retrieval. The format of each workshop is to be determined by the organizers. We expect workshops to contain ample time for discussion and engagement for all participants – not just those presenting papers. Workshops fostering collaboration, discussion, group problem-solving, and community-building initiatives are particularly encouraged. Workshops focused solely on the presentation of papers in a “mini conference” format are discouraged. The organizers of approved workshops are expected to define the workshop’s focus, gather and review submissions, and decide on the final program content. Organizers (including co-organizers) are strongly encouraged to write an article for the ACM SIGIR Forum summarizing the event. At least one organizer is expected to attend the entire workshop. Submission Guidelines Workshop proposals, in no more than 4 pages, should include the following information: - Title - Motivation for the workshop, its appropriateness for SIGIR, its complementarity to the main SIGIR 2025 conference topics, and why it would be of interest to the IR community - Theme and purpose of the workshop - Format (half/full-day), planned activities, and a tentative schedule of events, including potential keynote or other speakers - Planned interaction and engagement: Describe how the workshop will be structured to encourage active discussions, idea exchange, and collaboration among participants. Detail how the organizers will create an interactive format, distinct from a "mini conference," such as breakout discussions, group exercises, and panels designed to deepen engagement with the topic. - Distinction from main conference topics: Outline how the workshop theme covers areas complementary to or not typically discussed in the main conference program, highlighting emerging, niche, or interdisciplinary topics that could provide fresh perspectives to participants. - Audience reach: Describe the potential to attract new attendees who may not traditionally participate in SIGIR, including practitioners, academics from related fields, or industry specialists with unique insights or applications in information retrieval. - For each organizer, an indication of their likelihood of attending onsite - Special requirements and their importance to the workshop’s success, along with a contingency plan if requirements are unmet (such as potential last-minute organizers to ensure the workshop runs onsite) - List of organizers with a short biographical sketch of each organizer, describing relevant qualifications and experience - Names of potential program committee (PC) members – if a PC is required (any workshop involving written papers should have a PC) - Selection process for participants and/or presenters - Expected target audience and how the workshop will be advertised to reach them - Related workshops (if applicable): if previously held at SIGIR or another conference, organizers should briefly describe past attendance, outcomes, and the need for another workshop Workshop proposals should be prepared in the current ACM two-column conference format. Suitable LaTeX, Word, and Overleaf <https://www.overleaf.com/gallery/tagged/acm-official> templates are available from the ACM website <https://www.acm.org/publications/proceedings-template> (use the “sigconf” proceedings template). For LaTeX, the following should be used: documentclass[sigconf,natbib=true]{acmart} Proposals will be reviewed based on quality, complementarity to SIGIR 2025 conference topics, likelihood of attracting enough participants, and venue hosting capacity. Submissions will be reviewed by a program committee selected for this purpose, with final decisions made at the SIGIR Program Committee meeting. Proposals should be submitted in PDF through the EasyChair system: https://easychair.org/conferences/?conf=sigir2025. by selecting the “SIGIR 2025 Workshops” track. Organizers of accepted workshops will be invited to submit a camera-ready summary for inclusion in the SIGIR 2025 conference proceedings. Important Dates for Workshop Proposals Time zone: Anywhere on Earth (AoE) - Proposal submission: January 9, 2025 - Proposal acceptance notification: February 6, 2025 - Individual workshop paper submission: April 23, 2025 - Tentative workshop overview camera ready: April 24, 2025 - Tentative workshop paper acceptance notification: May 21, 2025 - Workshop day: July 17, 2025 Workshop Chairs - Surya Kallumadi, Coursera.org, USA - Zhaochun Ren, Leiden University, the Netherlands Contact For further information, please contact the SIGIR 2025 Workshop Co-chairs by email: sigir2025-workshop(a)dei.unipd.it.

1 0

COLING 2025 Tutorial: Bridging Linguistic Theory & AI - Unlocking Usage-Based Learning in Humans & Machines (Call for Participation)
by claire.n.bonial.civ＠army.mil 07 Dec '24

07 Dec '24

We’re delighted to invite you to the on-site tutorial at COLING 2025 that will discuss the latest work on bridging the worlds of linguistic theory with Large Language Models: “Bridging Linguistic Theory and AI: Usage-Based Learning in Humans and Machines.” For More information, visit: https://sites.google.com/view/linguistic-theory-and-ai/ The takeaways of this tutorial, which will be held in-person, will be an overview of the shared and divergent aspects of human and machine usage and data-driven learning, outlined from the theoretical perspective of usage-based psycholinguistic theory, with an emphasis on how this can shed light on the capabilities and limitations of LLMs, including multimodal models. This will serve as the bedrock for guiding participants and the NLP community towards more informed evaluation of large, pre-trained models, as well as energising solutions drawing upon the multi-modal information and linguistic theory that enriches language and many dimensions of interaction. Background: Unlike our past NLP tools, such as syntactic parsers and automatic semantic role labelling, LLMs lack grounding in linguistic theory. Instead, their development is based on the encoder-decoder architecture, which was originally designed for sequence- to-sequence tasks, specifically translation. This dichotomy impedes methods for evaluating LLMs, as their performance on meta-linguistic tasks, such as semantic role labelling, which previously served as benchmarks for the individual components in an NLP pipeline, are poor predictors of their fluency on downstream applications. However, the fact that LLMs, designed primarily to meet information-theoretic needs, can capture any linguistic information at all is fascinating. Additionally, it offers a novel foundation for exploring what can be achieved through exposure to information alone. Therefore, it has been compelling to turn to usage-based theories of language, such as Construction Grammar, to establish experimentally validated structures of language that speakers of a given language consistently recognise and are able to generalise over. We can then compare such structures to the linguistic structure that we can probe for within LLMs. For More information, visit: https://sites.google.com/view/linguistic-theory-and-ai/ We look forward to seeing you at COLING 2025 in January. On behalf of, Claire Bonial, Harish Tayyar Madabushi, Nikhil Krishnaswamy, James Pustejovsky

1 0

Workshop on Diversity in Large Speech and Language Models
by Hillmann, Stefan, Dr. 06 Dec '24

06 Dec '24

Dear all, Please find in the following the CoP for the workshop on Diversity in Large Speech and Language Models Date: 20 February 2025 Place: Humboldt-Universität Berlin, Dorotheenstraße 24, Berlin, Germany Machine learning techniques have conquered many different tasks in speech and natural language processing, such as speech recognition, information extraction, text and speech generation, and human machine interaction using natural language or speech (chatbots). Modern techniques typically rely on large models for representing general knowledge of one or several languages (Large Language Models, LLMs), or for representing speech and general audio characteristics. These models have been trained with large amounts of speech and language data, typically including web content. When humans interact with such technologies, the effectiveness of the interaction will be influenced by how far humans make use of the same type of language the models have been trained on or, in other words, if the models are able to generalize to the language used by humans when interacting with the technology. This may lead to some gradual forms of adaptation in human speech and language production, and users who do not adapt may be excluded from efficient use of such technologies. On top of this, as commercial model development follows market needs, under-represented languages and dialects/sociolects may decrease in terms of priorities. Furthermore, for many lesser spoken languages the necessary data is not available, which will worsen a digital divide in speech and language technology usage. The workshop sets out to discuss this problem based on scientific contributions from the perspective of computer science and linguistics (including computational linguistics and NLP). Topics which we aim to address include but are not limited to: User diversity: Which aspects of human speech and language production affect the performance of large foundation models? In which way, and for which tasks? Language use: How are large language models able to cope with different languages, dialects, and sociolects? How do they deal with code switching? Human adaptation: How does the use of large language models affect language comprehension, as well as speech and language production? Which alignment effects occur, and in which time spans? Model adaptation: How do models need to be designed to better cope with speech and language diversity? How do training and finetuning affect model performance? Inclusion: What data and technologies are necessary to better cope with diversity in large speech and language models? The workshop will consist of a number of oral presentations and discussion panels. Accepted speakers are invited to submit a short or long paper which will be published online after the workshop. Details and registration: https://www.tu.berlin/en/qu/about-us/news/isca-itg-workshop Best, Stefan Hillmann -- Dr.-Ing. Stefan Hillmann Wissenschaftlicher Mitarbeiter / Senior Researcher er, ihm / he, his Anrede / Form of address: Herr / Mr. Technische Universität Berlin Fakultät IV / Faculty 4 Elektrotechnik und Informatik / Electrical Engineering and Computer Science Quality and Usability Lab Sekr. MAR 6-7, Marchstr. 23, 10587 Berlin, GERMANY

1 0

Call for Birds of a Feather/Affinity Group Proposals at COLING 2025
by Mukund Choudhary 06 Dec '24

06 Dec '24

Dear colleagues, We hope you are doing great. As the Diversity & Inclusion team at the COLING 2025, we are excited to announce the calls to organize Birds of a Feather (BoF)/ Affinity Group sessions at the conference! If you are interested in discussing a specific theme in CL, NLP, or research in general, please take a few minutes to complete the form<https://forms.gle/8JrSBH7Gc3sgqLRBA>. We would appreciate receiving your proposal by 23:59 (AOE time), December 20th, 2024. Let us know if there is more we can assist with at coling2025diversity(a)googlegroups.com<mailto:coling2025diversity@googlegroups.com>. Best regards, Hawau and Mukund COLING 2025 Social Diversity & Inclusion (SD&I) Team P.S. All BoF hosts should be registered for COLING 2025, and the sessions will be in-person. If the link above is not clickable, please use this URL: https://forms.gle/8JrSBH7Gc3sgqLRBA

1 0

FIRST CALL FOR PAPERS: Scoping workshop “Corpus linguistics 2040: Which data, which methods, which models?”
by christian.mair＠anglistik.uni-freiburg.de 06 Dec '24

06 Dec '24

The workshop is jointly organised by the English Department of the University of Freiburg and the Institut für Deutsche Sprache (IDS) in Mannheim and, as a scoping workshop, designed to explore the major empirical, methodological and conceptual challenges facing our research community. Although the two organising institutions focus on English and German, corpus linguists working on other languages are explicitly invited to attend and contribute. Venue: IDS, Mannheim, Germany Date: 10 – 11 July 2025 For info on abstract submission etc. see: https://linguistlist.org/issues/35-3417 https://www.ids-mannheim.de/fi/veranstaltungen/workshop-corpus-linguistics-… Topics in focus include: - Corpora of spontaneous speech – new formats, new searches - Corpora versus AI/LLMs? Corpora and AI/LLMs? - Multilingual and multimodal corpora - Infrastructures for CLx and Digital Humanities Several renowned colleagues have already made commitments to present keynotes and/or organise round tables, including Silvia Bernardini, Mark Davies, Tony McEnery and Michaela Mahlberg. Christian Mair & Andreas Witt _______________________________________________________

1 0

[FinNLP@EMNLP-2025] Call for Shared Task Proposal
by CHUNG-CHI CHEN 06 Dec '24

06 Dec '24

Dear Colleagues, We are excited to announce the launch of the ACL Special Interest Group on Economic and Financial Natural Language Processing (SIG-FinTech)! To learn more about SIG-FinTech, we invite you to visit our official website: https://sigfintech.github.io/ We are also excited to share that the next FinNLP workshop will be held in conjunction with EMNLP 2025, taking place from November 5–9, 2025, in Suzhou, China. Stay tuned for more details—we will share updates soon! *As part of this event, we are now accepting shared task proposals for FinNLP@EMNLP-2025. Details about the call for proposals can be found below and on our website: https://sigfintech.github.io/fineval.html <https://sigfintech.github.io/fineval.html>* *Submission Deadline: January 31, 2025* We warmly encourage you to join us as shared task organizers. Feel free to contact us if you have any questions. Best regards, Chung-Chi --- 陳重吉 (Chung-Chi Chen), Ph.D. Researcher Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Japan E-mail: c.c.chen(a)acm.org Website: https://nlpfin.github.io/ FinEval-Proposal-2025: Financial Information Access and Evaluation Suzhou, China, November 5-9, 2025 Conference website https://sigfintech.github.io/fineval.html Submission link https://easychair.org/conferences/?conf=finevalproposal2025Financial Information Access and Evaluation (FinEval) EMNLP-2025, Nov. 5th-9th, 2025, Suzhou, China Shared tasks are collaborative initiatives where researchers and practitioners work together to address a common challenge using shared datasets and evaluation metrics. These tasks foster competition, collaboration, and advancement within the field, playing a significant role in both academic and industry communities. FinEval provides a venue for the community to share valuable insights and inspiration. Every year, we will call for proposals for the next edition of FinEval, which is collocated with the FinNLP workshop. *Call for Shared Task Proposal* We encourage submissions for tasks that test systems on financial text analysis, with a particular focus on cross-lingual, application-oriented tasks, and novel uses of NLP in finance. Tasks for non-English languages and cross-domain applications are welcome. *Proposal Criteria* Your task proposal will be evaluated on: - *Novelty:* Is the task addressing a unique or under-explored problem in financial NLP? - *Interest:* Will the task attract broad participation? - *Data Quality:* Is the data collection plan robust, with high inter-annotator agreement and appropriate licensing? - *Evaluation:* Is the evaluation methodology rigorous, and will it inspire future research? - *Impact:* What long-term impact will this task have on financial NLP? - *Ethics:* Data should avoid PII and adhere to ethical guidelines, including privacy compliance and ethical data use. *Task Organization* Organizers should be prepared to: - Ensure data quality and licensing, addressing ethical and security concerns. - Provide format checkers, baseline systems, and evaluation tools for participants. - Manage a competition platform (e.g., CodaLab) and maintain communication channels. - Write and present a task description paper at the FinEval session in FinNLP workshop. - Organize and review participant submissions and related documentation. *Organizer Roles* - *Lead Organizer:* Oversees the task, ensuring timely completion of deliverables. - *Co-Organizers:* Assist with data preparation, evaluation, and participant communication. - *Advisory Organizers:* Provide guidance, not necessarily engaged in daily tasks. Note: A minimum of two organizers is required per task. Single-organizer submissions will not be accepted. *Submission Guidelines* Task proposals should be in PDF format, following the ACL Template <https://github.com/acl-org/acl-style-files>, and must be no longer than 4 pages (plus references). Include the following sections: - *Overview:* Summary, community interest, and anticipated impact. - *Data & Resources:* Data sources, copyright details, data quantity, quality assurance, and ethical considerations. - *Pilot Task:* (recommended) Results and insights from initial studies. - *Evaluation:* Clear evaluation methodology and criteria. - *Task Reruns:* If a rerun, provide justification and expected impact. - *Task Organizers:* Names, affiliations, contact details, and relevant experience. *Important Dates* - *Task proposals due: *31 January 2025 - *Task selection notification: *20 February 2025 - *Sample data ready: *15 March 2025 - *Training data ready: *1 May 2025 - *Evaluation data ready: *1 June 2024 - *Evaluation start: *10 July 2025 - *Evaluation end: *31 July 2025 - *Paper submission: *31 Augest 2025 - *Notification to authors: *15 September 2025 - *Camera-ready papers due: *25 September 2025 - *FinNLP Workshop: *EMNLP-2025

1 0

Call for Participation: eRisk Lab @ CLEF 2025
by ACL Announcements 06 Dec '24

06 Dec '24

-------- Original Message -------- Subject: Call for Participation: eRisk Lab @ CLEF 2025 Date: 2024-12-05 19:50 From: ACL Announcements <announcements(a)aclweb.org> To: Announcements <announcements(a)aclweb.org> Call for Participation: eRisk Lab @ CLEF 2025 Are you passionate about leveraging AI for societal good? Join us for eRisk 2025, the ninth edition this lab at CLEF, where we delve into the methodologies and applications of early risk detection on the Internet. Our mission is to foster interdisciplinary research that addresses critical health and safety challenges, from identifying signs of depression to preventing online harm. Tasks for eRisk 2025 (More info at https://erisk.irlab.org/ ) Task 1: Search for Symptoms of Depression - Objective: Rank sentences from user writings by relevance to the 21 symptoms of the BDI-II questionnaire. - Highlights: - Use a TREC-formatted dataset with human-assessed relevance judgments. - Generate rankings for symptoms with evaluation via metrics like MAP and nDCG. - Create a valuable annotated corpus with broad applications beyond this task. - This is the third edition of the task: two years of training data. Task 2: Contextualized Early Detection of Depression *(New in 2025)* - Objective: Analyze full conversational contexts to detect early signs of depression. - Highlights: - Evaluate sequential user interactions for a holistic view of conversational dynamics. - Train on isolated writings and test in real-world-like scenarios with chronologically ordered conversations. - Metrics include accuracy and timeliness, measured via ERDE and similar frameworks. - This is the first edition of the contextualized tasks: three year of un-contextualized training data. Pilot Task: Conversational Depression Detection via LLMs (New in 2025, Interactive Task) - Objective: Engage with LLM personas to identify depressive symptoms based on conversational exchanges. - Highlights: - No training data provided—use creative and unsupervised approaches. - Collaborate in a limited-message dialogue setting, simulating real-world conditions. - Push the boundaries of AI-human interaction for mental health applications: are we able to accurately reproduce personas? -This is a pilot task. Participants will need to book a slot to interact with the LLM personas: register before the slots are gone! Key Dates - Dataset Release: -T1: 1st December 2024 for training collections and test dataset -T2: 1st December 2024 for training and 5th February 2025 for beginning of test stage (server opens) -T3: 5th February 2025 for beginning of test stage (server opens for interacting with the LLM) - Submission Deadlines: -T1: 1st April 2025 for submitting participants’ results to FTP -T2: 12th April 2025 end of test stage (server closes) -T3: 12th April 2025 end of test stage (server closes) - CLEF 2025 Conference: 9-12 September 2025, Madrid, Spain. How to Participate 1. Register: Sign up through the [CLEF 2025 Labs Registration site](https://clef2025-labs-registration.dei.unipd.it/) 2. Submit Agreements: Complete the user agreement form to access datasets. 3. Join the Community: Join our Google Groups https://groups.google.com/g/erisk-clef ! Lab co-chairs Javier Parapar, Univ. A Coruña, Spain Anxo Pérez, Univ. A Coruña, Spain Xi Wang, Univ. Sheffield, United Kingdom Fabio Crestani, Univ. Lugano, Switzerland More Information Visit the [eRisk website](https://erisk.irlab.org) for task details, datasets, and registration guidelines.

1 0

Computational linguistics lecturer (assistant professor) vacancy, Queen Mary University of London
by Matthew Purver 05 Dec '24

05 Dec '24

Queen Mary University of London is currently advertising a Computational Linguistics faculty position at the level of Lecturer (Assistant Professor). The closing date is 5 January. https://qmul-jobs.tal.net/vx/mobile-0/appcentre-ext/brand-4/candidate/so/pm… This post is based in the Linguistics Department, in Humanities and Social Sciences. Faculty in the department have a number of CL-adjacent interests and collaborations. There is also a substantial Computational Linguistics group in Computer Science, with whom the department has strong ties. The appointed candidate will enhance our teaching at the interface of Linguistics and CL/AI, for students who are interested in gaining more computational or AI-linked skills.The position is a good fit for applicants with a wide range of computational and AI-related interests, whether text or speech, and who are interested in working with students with a range of backgrounds and interests. For further information please contact Prof Devyani Sharma < d.sharma(a)qmul.ac.uk> -- Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/ Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/ Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/ School of Electronic Engineering and Computer Science Queen Mary University of London, London E1 4NS, UK *My working days for QMUL are **Tuesday-Thursday**; responses to mail on other days may be delayed.*

1 0

Webminar by Javier de la Rosa - Artificial Intelligence Lab (National Library of Norway)
by HiTZ zentroa 05 Dec '24

05 Dec '24

**** We apologize for the multiple copies of this email. In case you are already registered to the next webinar, you do not need to register again. **** Dear colleague, We are happy to announce the next webinar in the Language Technology webinar series organized by the HiTZ Chair of AI&LT (https://hitz.eus). You can check the videos of previous webinars and the schedule for upcoming webinars here: http://www.hitz.eus/webinars Next webinar: Speaker: Javier de la Rosa - Artificial Intelligence Lab (National Library of Norway) Title: The Mímir Project: Impact of copyrighted materials in LLMs Date: Thursday, December 12, 2024 - 15:00 Summary: The Mímir Project is an initiative by the Norwegian government that aims to assess the significance and influence of copyrighted materials in the development and performance of generative large language models (LLMs) tailored to the Norwegian languages. This collaborative effort involves three leading institutions from different regions of the country: the National Library of Norway (NB), the University of Oslo (UiO), and the Norwegian University of Science and Technology (NTNU); each contributing unique expertise in language technology, corpus curation, model training, copyright law, and computational linguistics. The ultimate goal of the project was to gather empirical evidence that informed the formulation of a compensation scheme for authors whose works are utilized by these advanced artificial intelligence (AI) systems, ensuring that intellectual property rights are respected and adequately compensated. Bio: Javier de la Rosa is a Research Scientist at the Artificial Intelligence Lab at the National Library of Norway. A former Postdoctoral Fellow in Natural Language Processing at UNED, he holds a PhD in Hispanic Studies with a specialization in Digital Humanities by the University of Western Ontario, and a Masters in Artificial Intelligence by the University of Seville. Javier has previously worked as a Research Engineer at the Stanford University, and as the Technical Lead at the University of Western Ontario CulturePlex Lab. He is interested in Natural Language Processing applied to historical and literary text, with a special focus on large language models. Upcoming webinars: · Ekaterina Shutova (January 30, 2025) · Sebastian Ruder (February 6, 2025) · Christian Herff (Thursday, March 6, 2025) If you are interested in participating, please complete this registration form: http://www.hitz.eus/webinar_izenematea If you cannot attend this seminar, but you want to be informed of the following HiTZ webinars, please complete this registration form instead: http://www.hitz.eus/webinar_info Best wishes, HiTZ Zentroa P.S: HiTZ will not grant any type of certificate for attendance at these webinars.

1 0

Reminder: 3-year postdoc position in NLP at the University of Oslo
by Lilja Øvrelid 05 Dec '24

05 Dec '24

Reminder that the closing date for this position is *December 13th*: A position as Postdoctoral Research Fellow in Natural Language Processing is available within MediaFutures:Research Centre for Responsible Media Technology & Innovation at the Language Technology Group (LTG) at the University of Oslo (UiO), Norway. The closing date is December 13th, 2024. For more information about the position and the research group, please see the full announcement here: https://www.jobbnorge.no/en/available-jobs/job/270966/postdoctoral-research… Please do not hesitate to contact me for any further information. Best regards, Lilja

1 0

2025

2024

2023

2022

Corpora