December 2024 - Corpora

FIRST CALL FOR PAPERS: Scoping workshop “Corpus linguistics 2040: Which data, which methods, which models?”
by christian.mair＠anglistik.uni-freiburg.de 06 Dec '24

06 Dec '24

The workshop is jointly organised by the English Department of the University of Freiburg and the Institut für Deutsche Sprache (IDS) in Mannheim and, as a scoping workshop, designed to explore the major empirical, methodological and conceptual challenges facing our research community. Although the two organising institutions focus on English and German, corpus linguists working on other languages are explicitly invited to attend and contribute. Venue: IDS, Mannheim, Germany Date: 10 – 11 July 2025 For info on abstract submission etc. see: https://linguistlist.org/issues/35-3417 https://www.ids-mannheim.de/fi/veranstaltungen/workshop-corpus-linguistics-… Topics in focus include: - Corpora of spontaneous speech – new formats, new searches - Corpora versus AI/LLMs? Corpora and AI/LLMs? - Multilingual and multimodal corpora - Infrastructures for CLx and Digital Humanities Several renowned colleagues have already made commitments to present keynotes and/or organise round tables, including Silvia Bernardini, Mark Davies, Tony McEnery and Michaela Mahlberg. Christian Mair & Andreas Witt _______________________________________________________

1 0

[FinNLP@EMNLP-2025] Call for Shared Task Proposal
by CHUNG-CHI CHEN 06 Dec '24

06 Dec '24

Dear Colleagues, We are excited to announce the launch of the ACL Special Interest Group on Economic and Financial Natural Language Processing (SIG-FinTech)! To learn more about SIG-FinTech, we invite you to visit our official website: https://sigfintech.github.io/ We are also excited to share that the next FinNLP workshop will be held in conjunction with EMNLP 2025, taking place from November 5–9, 2025, in Suzhou, China. Stay tuned for more details—we will share updates soon! *As part of this event, we are now accepting shared task proposals for FinNLP@EMNLP-2025. Details about the call for proposals can be found below and on our website: https://sigfintech.github.io/fineval.html <https://sigfintech.github.io/fineval.html>* *Submission Deadline: January 31, 2025* We warmly encourage you to join us as shared task organizers. Feel free to contact us if you have any questions. Best regards, Chung-Chi --- 陳重吉 (Chung-Chi Chen), Ph.D. Researcher Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Japan E-mail: c.c.chen(a)acm.org Website: https://nlpfin.github.io/ FinEval-Proposal-2025: Financial Information Access and Evaluation Suzhou, China, November 5-9, 2025 Conference website https://sigfintech.github.io/fineval.html Submission link https://easychair.org/conferences/?conf=finevalproposal2025Financial Information Access and Evaluation (FinEval) EMNLP-2025, Nov. 5th-9th, 2025, Suzhou, China Shared tasks are collaborative initiatives where researchers and practitioners work together to address a common challenge using shared datasets and evaluation metrics. These tasks foster competition, collaboration, and advancement within the field, playing a significant role in both academic and industry communities. FinEval provides a venue for the community to share valuable insights and inspiration. Every year, we will call for proposals for the next edition of FinEval, which is collocated with the FinNLP workshop. *Call for Shared Task Proposal* We encourage submissions for tasks that test systems on financial text analysis, with a particular focus on cross-lingual, application-oriented tasks, and novel uses of NLP in finance. Tasks for non-English languages and cross-domain applications are welcome. *Proposal Criteria* Your task proposal will be evaluated on: - *Novelty:* Is the task addressing a unique or under-explored problem in financial NLP? - *Interest:* Will the task attract broad participation? - *Data Quality:* Is the data collection plan robust, with high inter-annotator agreement and appropriate licensing? - *Evaluation:* Is the evaluation methodology rigorous, and will it inspire future research? - *Impact:* What long-term impact will this task have on financial NLP? - *Ethics:* Data should avoid PII and adhere to ethical guidelines, including privacy compliance and ethical data use. *Task Organization* Organizers should be prepared to: - Ensure data quality and licensing, addressing ethical and security concerns. - Provide format checkers, baseline systems, and evaluation tools for participants. - Manage a competition platform (e.g., CodaLab) and maintain communication channels. - Write and present a task description paper at the FinEval session in FinNLP workshop. - Organize and review participant submissions and related documentation. *Organizer Roles* - *Lead Organizer:* Oversees the task, ensuring timely completion of deliverables. - *Co-Organizers:* Assist with data preparation, evaluation, and participant communication. - *Advisory Organizers:* Provide guidance, not necessarily engaged in daily tasks. Note: A minimum of two organizers is required per task. Single-organizer submissions will not be accepted. *Submission Guidelines* Task proposals should be in PDF format, following the ACL Template <https://github.com/acl-org/acl-style-files>, and must be no longer than 4 pages (plus references). Include the following sections: - *Overview:* Summary, community interest, and anticipated impact. - *Data & Resources:* Data sources, copyright details, data quantity, quality assurance, and ethical considerations. - *Pilot Task:* (recommended) Results and insights from initial studies. - *Evaluation:* Clear evaluation methodology and criteria. - *Task Reruns:* If a rerun, provide justification and expected impact. - *Task Organizers:* Names, affiliations, contact details, and relevant experience. *Important Dates* - *Task proposals due: *31 January 2025 - *Task selection notification: *20 February 2025 - *Sample data ready: *15 March 2025 - *Training data ready: *1 May 2025 - *Evaluation data ready: *1 June 2024 - *Evaluation start: *10 July 2025 - *Evaluation end: *31 July 2025 - *Paper submission: *31 Augest 2025 - *Notification to authors: *15 September 2025 - *Camera-ready papers due: *25 September 2025 - *FinNLP Workshop: *EMNLP-2025

1 0

Call for Participation: eRisk Lab @ CLEF 2025
by ACL Announcements 06 Dec '24

06 Dec '24

-------- Original Message -------- Subject: Call for Participation: eRisk Lab @ CLEF 2025 Date: 2024-12-05 19:50 From: ACL Announcements <announcements(a)aclweb.org> To: Announcements <announcements(a)aclweb.org> Call for Participation: eRisk Lab @ CLEF 2025 Are you passionate about leveraging AI for societal good? Join us for eRisk 2025, the ninth edition this lab at CLEF, where we delve into the methodologies and applications of early risk detection on the Internet. Our mission is to foster interdisciplinary research that addresses critical health and safety challenges, from identifying signs of depression to preventing online harm. Tasks for eRisk 2025 (More info at https://erisk.irlab.org/ ) Task 1: Search for Symptoms of Depression - Objective: Rank sentences from user writings by relevance to the 21 symptoms of the BDI-II questionnaire. - Highlights: - Use a TREC-formatted dataset with human-assessed relevance judgments. - Generate rankings for symptoms with evaluation via metrics like MAP and nDCG. - Create a valuable annotated corpus with broad applications beyond this task. - This is the third edition of the task: two years of training data. Task 2: Contextualized Early Detection of Depression *(New in 2025)* - Objective: Analyze full conversational contexts to detect early signs of depression. - Highlights: - Evaluate sequential user interactions for a holistic view of conversational dynamics. - Train on isolated writings and test in real-world-like scenarios with chronologically ordered conversations. - Metrics include accuracy and timeliness, measured via ERDE and similar frameworks. - This is the first edition of the contextualized tasks: three year of un-contextualized training data. Pilot Task: Conversational Depression Detection via LLMs (New in 2025, Interactive Task) - Objective: Engage with LLM personas to identify depressive symptoms based on conversational exchanges. - Highlights: - No training data provided—use creative and unsupervised approaches. - Collaborate in a limited-message dialogue setting, simulating real-world conditions. - Push the boundaries of AI-human interaction for mental health applications: are we able to accurately reproduce personas? -This is a pilot task. Participants will need to book a slot to interact with the LLM personas: register before the slots are gone! Key Dates - Dataset Release: -T1: 1st December 2024 for training collections and test dataset -T2: 1st December 2024 for training and 5th February 2025 for beginning of test stage (server opens) -T3: 5th February 2025 for beginning of test stage (server opens for interacting with the LLM) - Submission Deadlines: -T1: 1st April 2025 for submitting participants’ results to FTP -T2: 12th April 2025 end of test stage (server closes) -T3: 12th April 2025 end of test stage (server closes) - CLEF 2025 Conference: 9-12 September 2025, Madrid, Spain. How to Participate 1. Register: Sign up through the [CLEF 2025 Labs Registration site](https://clef2025-labs-registration.dei.unipd.it/) 2. Submit Agreements: Complete the user agreement form to access datasets. 3. Join the Community: Join our Google Groups https://groups.google.com/g/erisk-clef ! Lab co-chairs Javier Parapar, Univ. A Coruña, Spain Anxo Pérez, Univ. A Coruña, Spain Xi Wang, Univ. Sheffield, United Kingdom Fabio Crestani, Univ. Lugano, Switzerland More Information Visit the [eRisk website](https://erisk.irlab.org) for task details, datasets, and registration guidelines.

1 0

Computational linguistics lecturer (assistant professor) vacancy, Queen Mary University of London
by Matthew Purver 05 Dec '24

05 Dec '24

Queen Mary University of London is currently advertising a Computational Linguistics faculty position at the level of Lecturer (Assistant Professor). The closing date is 5 January. https://qmul-jobs.tal.net/vx/mobile-0/appcentre-ext/brand-4/candidate/so/pm… This post is based in the Linguistics Department, in Humanities and Social Sciences. Faculty in the department have a number of CL-adjacent interests and collaborations. There is also a substantial Computational Linguistics group in Computer Science, with whom the department has strong ties. The appointed candidate will enhance our teaching at the interface of Linguistics and CL/AI, for students who are interested in gaining more computational or AI-linked skills.The position is a good fit for applicants with a wide range of computational and AI-related interests, whether text or speech, and who are interested in working with students with a range of backgrounds and interests. For further information please contact Prof Devyani Sharma < d.sharma(a)qmul.ac.uk> -- Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/ Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/ Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/ School of Electronic Engineering and Computer Science Queen Mary University of London, London E1 4NS, UK *My working days for QMUL are **Tuesday-Thursday**; responses to mail on other days may be delayed.*

1 0

Webminar by Javier de la Rosa - Artificial Intelligence Lab (National Library of Norway)
by HiTZ zentroa 05 Dec '24

05 Dec '24

**** We apologize for the multiple copies of this email. In case you are already registered to the next webinar, you do not need to register again. **** Dear colleague, We are happy to announce the next webinar in the Language Technology webinar series organized by the HiTZ Chair of AI&LT (https://hitz.eus). You can check the videos of previous webinars and the schedule for upcoming webinars here: http://www.hitz.eus/webinars Next webinar: Speaker: Javier de la Rosa - Artificial Intelligence Lab (National Library of Norway) Title: The Mímir Project: Impact of copyrighted materials in LLMs Date: Thursday, December 12, 2024 - 15:00 Summary: The Mímir Project is an initiative by the Norwegian government that aims to assess the significance and influence of copyrighted materials in the development and performance of generative large language models (LLMs) tailored to the Norwegian languages. This collaborative effort involves three leading institutions from different regions of the country: the National Library of Norway (NB), the University of Oslo (UiO), and the Norwegian University of Science and Technology (NTNU); each contributing unique expertise in language technology, corpus curation, model training, copyright law, and computational linguistics. The ultimate goal of the project was to gather empirical evidence that informed the formulation of a compensation scheme for authors whose works are utilized by these advanced artificial intelligence (AI) systems, ensuring that intellectual property rights are respected and adequately compensated. Bio: Javier de la Rosa is a Research Scientist at the Artificial Intelligence Lab at the National Library of Norway. A former Postdoctoral Fellow in Natural Language Processing at UNED, he holds a PhD in Hispanic Studies with a specialization in Digital Humanities by the University of Western Ontario, and a Masters in Artificial Intelligence by the University of Seville. Javier has previously worked as a Research Engineer at the Stanford University, and as the Technical Lead at the University of Western Ontario CulturePlex Lab. He is interested in Natural Language Processing applied to historical and literary text, with a special focus on large language models. Upcoming webinars: · Ekaterina Shutova (January 30, 2025) · Sebastian Ruder (February 6, 2025) · Christian Herff (Thursday, March 6, 2025) If you are interested in participating, please complete this registration form: http://www.hitz.eus/webinar_izenematea If you cannot attend this seminar, but you want to be informed of the following HiTZ webinars, please complete this registration form instead: http://www.hitz.eus/webinar_info Best wishes, HiTZ Zentroa P.S: HiTZ will not grant any type of certificate for attendance at these webinars.

1 0

Reminder: 3-year postdoc position in NLP at the University of Oslo
by Lilja Øvrelid 05 Dec '24

05 Dec '24

Reminder that the closing date for this position is *December 13th*: A position as Postdoctoral Research Fellow in Natural Language Processing is available within MediaFutures:Research Centre for Responsible Media Technology & Innovation at the Language Technology Group (LTG) at the University of Oslo (UiO), Norway. The closing date is December 13th, 2024. For more information about the position and the research group, please see the full announcement here: https://www.jobbnorge.no/en/available-jobs/job/270966/postdoctoral-research… Please do not hesitate to contact me for any further information. Best regards, Lilja

1 0

Interspeech 2025: Call for tutorials
by Grzegorz Chrupała 05 Dec '24

05 Dec '24

============================================ Interspeech 2025 17 - 21 August, Rotterdam, The Netherlands https://www.interspeech2025.org/ ============================================ Call for Tutorials https://www.interspeech2025.org/call-for-tutorials ============================================ Important Dates =============== Proposals of tutorials due: 1 February 2025 Notification of selection to organizers: 5 April 2025 Final announcement of tutorials on the website: 20 April 2025 Tutorial Day: 17 August 2025 The Tutorial Day is an important component of INTERSPEECH. It offers a unique opportunity for experts in various speech-related domains to provide conference attendees with rich learning experiences. To ensure a high-quality and diverse set of tutorials at INTERSPEECH 2025, we invite proposals that cover both introductory and advanced topics, from longstanding research challenges and current research trends to emerging areas of study. These proposals should target early-stage researchers and experienced researchers who wish to deepen their knowledge in a new area. Each tutorial will be 3 hours long. The tutorials are expected to provide an overview of an area of research rather than focus on an individual presenter’s research program and findings. While it is not mandatory to address the theme of Interspeech 2025, "Fair and Inclusive Speech Science and Technology,” we encourage proposals to consider how their tutorials might align with or reflect this theme. We especially welcome proposals related to the four strands of Interspeech 2025: Individual Differences in Speech Processing, Under-Researched Languages, Dialects, and Accents, Inclusive Technology for Atypical Speech Communication, and Ethical Considerations. Proposals from individuals who identify as being underrepresented in the speech science and technology community (due to factors such as geographical location, economic status, race, age, gender, sexual orientation, or any other characteristic) are particularly welcome. Proposals Should Include (in the following order) • Title • Presenter(s) name and affiliation • Contact information (email, telephone) • Abstract (no more than 200 words) summarizing the proposed tutorial that could be used as an advertisement • Description (1 – 2 pages; no more than 800 words), which includes a few relevant references and any webpages/material useful for reviewing the proposal • Relevance of the proposed tutorial for Interspeech 2025 (0.5 – 1 page; no more than 400 words) • Tutorial logistics, including • Duration (1 session or 2 sessions; 3 hours = 1 session). If 1 session, please indicate your preference for morning or afternoon. • Presenter(s) information (name(s)) • Special equipment required for the tutorial • Description of accompanying material provided (handouts, storage devices with media, etc.) • Presenter information • Biography of presenter(s) • Key publications of presenter(s) on the tutorial topic • List of previous tutorial experience • Audience information • Target audience (e.g. new researchers to the field, research students, specialists of adjacent fields) • Other considerations/comments Submission Procedure Proposals for the INTERSPEECH 2025 tutorials must be no more than 5 pages long and must conform to the format stated above; please ensure that the headings listed above are identified clearly. Proposals should be submitted by email to tutorials(a)interspeech2025.org by Feb 1, 2025. Notifications of selection will go out by April 5, 2025. By submitting a proposal, the presenter(s) understand the ISCA policy of strongly encouraging video recording of the tutorial for education purposes if the proposal is accepted. Access to recording materials will be given through the ISCA Video Archives. Questions? Please contact our Tutorial Chairs at tutorials(a)interspeech2025.org • Yiya Chen - Leiden University, The Netherlands • Daan van Esch - Google (Amsterdam)

1 0

First Call for Participation- IWSLT 2025
by Atul K. Ojha 04 Dec '24

04 Dec '24

Apologies for cross-posting. ---------------------------------------- *The International Conference on Spoken Language Translation* *ACL – 22nd IWSLT 2025 – First Call for Participation* *31 July-1 August 2025 - Vienna, Austria* http://iwslt.org The International Conference on Spoken Language Translation (IWSLT) <https://iwslt.org/> is the premier annual conference for all aspects of Spoken Language Translation. Every year, the conference organises and sponsors open evaluation campaigns around key challenges in simultaneous and consecutive translation, under real-time/low latency or offline conditions and under low-resource or multilingual constraints. System descriptions and results from participants’ systems and scientific papers related to key algorithmic advances and best practices are presented. IWSLT is the venue of the SIGSLTs <https://iwslt.org/sigslt/>, the Special Interest Group on Spoken Language Translation <https://iwslt.org/sigslt/> of ACL <https://www.aclweb.org/portal/>, ISCA <https://www.isca-speech.org/> and ELRA <https://www.elra.info/>. With a track record of 21 years, IWSLT benchmarks and proceedings serve as reference for all researchers and practitioners working on speech translation and related fields. The 22nd edition of IWSLT will be run as a hybrid ELRA <https://www.elra.info/>/ACL <https://www.aclweb.org/portal/> event, co-located with ACL 2025 <https://2025.aclweb.org/> from 31 July to 1 August 2025. *Important Dates* *January 1, 2025*: Release of shared task training and dev data *March 15, 2025*: Scientific paper submission deadline *Apr 1-15, 2025*: Evaluation period *April 21, 2025*: System description paper submission deadline *May 15, 2025*: Notification of acceptance *June 1, 2025*: Camera-ready deadline (all paper) *July 31-Aug 1*, *2025*: IWSLT conference Evaluation The IWSLT 2025 features shared tasks <https://iwslt.org/2025/#shared-tasks> that address the following focus areas: - High-resource ST: Offline track, Simultaneous track, Subtitling track - Low-resource ST: Low-resource and Indic (multilingual) tracks - Instruction-following Speech Processing track: Technical domain ST, ASR, Summarization, and QA Training and development data for each shared task will be prepared and released by the respective organisers (for further information on this initiative, please refer to the IWSLT website <https://iwslt.org/2025/>). Participants will receive instructions about how to submit their runs. In addition, participants have the opportunity to present their work through a system paper that will be published in the ACL Proceedings. Conference IWSLT also invites submissions of scientific papers to be published in the ACL Proceedings and presented either in oral or poster format. The conference selects high-quality, original contributions on theoretical and practical issues of spoken language translation research, technologies and applications. Submissions will be accepted directly through the IWSLT submission site (to be announced on the website <https://iwslt.org/2025/>). We will also accept commitments of submissions with reviews from the ACL Rolling Review. Additionally, to foster cross-pollination of ideas, the conference also invites the presentation of papers on speech translation recently published elsewhere. Please note that this is for non-archival presentation of papers relevant to speech translation already published in other venues (e.g., Findings for the *ACL, speech, NLP or MT conferences). Submissions for this category will be accepted through a dedicated form (to be announced on the website <https://iwslt.org/2025/>). Papers will be checked for relevance to IWSLT, and assigned either oral or poster presentation slots if selected. Contact Please email iwslt-evaluation-campaign(a)googlegroups.com if you have any questions related to the shared tasks. Thanks, Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul (IWSLT organisers)

1 0

Call for online participation: Treebanks and Linguistic Theories (TLT 2024), December 5-6
by Zinsmeister, Heike 04 Dec '24

04 Dec '24

Dear all, On 5 and 6 December, the 22nd International Workshop on Treebanks and Linguistic Theories (TLT 2024) is being hosted at University of Hamburg. The workshop will be held in hybrid form and you are welcome to join us online! Keynote talks: December 5: 10:00-11:00 h Anna Nedoluzhko (Charles University Prague) Multilingual Coreference and Treebanking: Benefits of Interaction<https://www.korpuslab.uni-hamburg.de/en/tlt2024/program/_boxes/abstract-ann…> December 6: 12:00-13:00 h Marcel Bollmann (Linköping University) Increasing language diversity in NLP: Insights from CreoleVal<https://www.korpuslab.uni-hamburg.de/en/tlt2024/program/_boxes/abstract-mar…> On Friday, December 6 14:30-16:30 h There will be a discussion panel on "Treebanks and linguistic annotation in the area of LLMs” Panelists: Marcel Bollmann (Linköping University), Daniel Dakota (Indiana University), Sandra Kübler (Indiana University), Anna Nedoluzhko (Charles University Prague), Juri Opitz (Universität Zürich) Find the full workshop programme on our website: https://www.korpuslab.uni-hamburg.de/en/tlt2024/program.html If you would like to participate, please register via this form and we will send you the Zoom link in advance of the workshop: https://www.korpuslab.uni-hamburg.de/en/tlt2024/registration.html Please note that due to security reasons, University of Hamburg allows Zoom conferencing only via the Zoom app, so joining via browser will not work. Do not hesitate to contact us via tlt2024.gw(a)uni-hamburg.de<mailto:tlt2024.gw@uni-hamburg.de> if you have any further questions. The workshop is endorsed by ACL SIGPARSE<https://www.sigparse.org/> and we like to thank SFB 1102<https://sfb1102.uni-saarland.de/> for their financial support. Best, TLT 2024 Program Chairs --------------------------- Prof. Dr. Heike Zinsmeister (sie/ihr) Linguistik des Deutschen / Korpuslinguistik Universität Hamburg, Institut für Germanistik, Raum C7012 Von-Melle-Park 6, Postfach #15, D-20146 Hamburg Tel.: 040 42838-7119 heike.zinsmeister(a)uni-hamburg.de http://www.slm.uni-hamburg.de/germanistik/personen/zinsmeister.html

1 0

PhD in Mathematical foundations for AI in London
by Haim Dubossarsky 03 Dec '24

03 Dec '24

On behalf of Prof. Omer Bobrowski and Prof Primoz Skraba. An exciting PhD opportunity at the intersection of Machine Learning, Mathematics and model interpretability is offered at the Centre for Probability, Statistics and Data Science at Queen Mary University of London. Project description This PhD position is part of the “Erlangen Programme for AI,” a prestigious multi-university initiative focused on developing a rigorous mathematical foundation for Artificial Intelligence. The project emphasizes the integration of concepts from topology, geometry, and probability, with the overarching goal of enhancing the interpretability, robustness, and generalization of AI models. Understanding Deep Neural Networks DNNs represent a cutting-edge approach in machine learning and AI, but there remains a significant gap in understanding the intrinsic mechanisms behind their powerful performance. This research aims to combine topological and geometric tools with probabilistic analysis to unveil hidden structures in neural networks. By investigating how these structures arise during training, how information flows through layers, and what vulnerabilities exist, we expect to gain insights that will drive future advancements in model design, optimization, and resilience. Understanding Large Language Models LLMs have shown to capture (encode) both the semantics and structure (grammar) of language within their learned parameters. However, the methods used to access this knowledge (decoding) remain basic, typically involving the representation of textual objects (e.g., words, sentences) as continuous vectors in Euclidean space. This project aims to leverage geometry and topology to explore the internal representations and latent spaces within the LLMs parameters that go beyond simple vectors analysis. We will develop advanced methods for decoding meaning and structure from LLMs, enabling richer and more diverse access to the linguistic knowledge they encode, and test it in a range of linguistic tasks (polysemy, cross-lingual transfer, among others). This approach holds the potential for breakthroughs in both AI theory and practical applications. Deadline is Wednesday, January 29, 2025 Further details can be found here: https://www.findaphd.com/phds/project/mathematical-foundations-for-ai/?p177…

1 0

2026

2025

2024

2023

2022

Corpora December 2024