Dear all,
Please find in the following the CoP for the workshop on Diversity in Large Speech and Language Models
Date: 20 February 2025
Place: Humboldt-Universität Berlin, Dorotheenstraße 24, Berlin, Germany
Machine learning techniques have conquered many different tasks in speech and natural language processing, such as speech recognition, information extraction, text and speech generation, and human machine interaction using natural language or speech (chatbots). Modern techniques typically rely on large models for representing general knowledge of one or several languages (Large Language Models, LLMs), or for representing speech and general audio characteristics. These models have been trained with large amounts of speech and language data, typically including web content. When humans interact with such technologies, the effectiveness of the interaction will be influenced by how far humans make use of the same type of language the models have been trained on or, in other words, if the models are able to generalize to the language used by humans when interacting with the technology. This may lead to some gradual forms of adaptation in human speech and language production, and users who do not adapt may be excluded from efficient use of such technologies. On top of this, as commercial model development follows market needs, under-represented languages and dialects/sociolects may decrease in terms of priorities. Furthermore, for many lesser spoken languages the necessary data is not available, which will worsen a digital divide in speech and language technology usage.
The workshop sets out to discuss this problem based on scientific contributions from the perspective of computer science and linguistics (including computational linguistics and NLP).
Topics which we aim to address include but are not limited to:
User diversity: Which aspects of human speech and language production affect the performance of large foundation models? In which way, and for which tasks?
Language use: How are large language models able to cope with different languages, dialects, and sociolects? How do they deal with code switching?
Human adaptation: How does the use of large language models affect language comprehension, as well as speech and language production? Which alignment effects occur, and in which time spans?
Model adaptation: How do models need to be designed to better cope with speech and language diversity? How do training and finetuning affect model performance?
Inclusion: What data and technologies are necessary to better cope with diversity in large speech and language models?
The workshop will consist of a number of oral presentations and discussion panels. Accepted speakers are invited to submit a short or long paper which will be published online after the workshop.
Details and registration: https://www.tu.berlin/en/qu/about-us/news/isca-itg-workshop
Best,
Stefan Hillmann
--
Dr.-Ing. Stefan Hillmann
Wissenschaftlicher Mitarbeiter / Senior Researcher
er, ihm / he, his
Anrede / Form of address: Herr / Mr.
Technische Universität Berlin
Fakultät IV / Faculty 4
Elektrotechnik und Informatik / Electrical Engineering and Computer Science
Quality and Usability Lab
Sekr. MAR 6-7, Marchstr. 23, 10587 Berlin, GERMANY
Dear colleagues,
We hope you are doing great. As the Diversity & Inclusion team at the COLING 2025, we are excited to announce the calls to organize Birds of a Feather (BoF)/ Affinity Group sessions at the conference!
If you are interested in discussing a specific theme in CL, NLP, or research in general, please take a few minutes to complete the form<https://forms.gle/8JrSBH7Gc3sgqLRBA>. We would appreciate receiving your proposal by 23:59 (AOE time), December 20th, 2024.
Let us know if there is more we can assist with at coling2025diversity(a)googlegroups.com<mailto:coling2025diversity@googlegroups.com>.
Best regards,
Hawau and Mukund
COLING 2025 Social Diversity & Inclusion (SD&I) Team
P.S. All BoF hosts should be registered for COLING 2025, and the sessions will be in-person. If the link above is not clickable, please use this URL: https://forms.gle/8JrSBH7Gc3sgqLRBA
The workshop is jointly organised by the English Department of the University of Freiburg and the Institut für Deutsche Sprache (IDS) in Mannheim and, as a scoping workshop, designed to explore the major empirical, methodological and conceptual challenges facing our research community. Although the two organising institutions focus on English and German, corpus linguists working on other languages are explicitly invited to attend and contribute.
Venue: IDS, Mannheim, Germany
Date: 10 – 11 July 2025
For info on abstract submission etc. see:
https://linguistlist.org/issues/35-3417https://www.ids-mannheim.de/fi/veranstaltungen/workshop-corpus-linguistics-…
Topics in focus include:
- Corpora of spontaneous speech – new formats, new searches
- Corpora versus AI/LLMs? Corpora and AI/LLMs?
- Multilingual and multimodal corpora
- Infrastructures for CLx and Digital Humanities
Several renowned colleagues have already made commitments to present keynotes and/or organise round tables, including Silvia Bernardini, Mark Davies, Tony McEnery and Michaela Mahlberg.
Christian Mair & Andreas Witt
_______________________________________________________
Dear Colleagues,
We are excited to announce the launch of the ACL Special Interest Group on
Economic and Financial Natural Language Processing (SIG-FinTech)! To learn
more about SIG-FinTech, we invite you to visit our official website:
https://sigfintech.github.io/
We are also excited to share that the next FinNLP workshop will be held in
conjunction with EMNLP 2025, taking place from November 5–9, 2025, in
Suzhou, China. Stay tuned for more details—we will share updates soon!
*As part of this event, we are now accepting shared task proposals for
FinNLP@EMNLP-2025. Details about the call for proposals can be found below
and on our website: https://sigfintech.github.io/fineval.html
<https://sigfintech.github.io/fineval.html>*
*Submission Deadline: January 31, 2025*
We warmly encourage you to join us as shared task organizers. Feel free to
contact us if you have any questions.
Best regards,
Chung-Chi
---
陳重吉 (Chung-Chi Chen), Ph.D.
Researcher
Artificial Intelligence Research Center, National Institute of Advanced
Industrial Science and Technology, Japan
E-mail: c.c.chen(a)acm.org
Website: https://nlpfin.github.io/
FinEval-Proposal-2025: Financial Information Access and Evaluation
Suzhou, China, November 5-9, 2025
Conference website https://sigfintech.github.io/fineval.html
Submission link
https://easychair.org/conferences/?conf=finevalproposal2025Financial
Information Access and Evaluation (FinEval)
EMNLP-2025, Nov. 5th-9th, 2025, Suzhou, China
Shared tasks are collaborative initiatives where researchers and
practitioners work together to address a common challenge using shared
datasets and evaluation metrics. These tasks foster competition,
collaboration, and advancement within the field, playing a significant role
in both academic and industry communities. FinEval provides a venue for the
community to share valuable insights and inspiration. Every year, we will
call for proposals for the next edition of FinEval, which is collocated
with the FinNLP workshop.
*Call for Shared Task Proposal*
We encourage submissions for tasks that test systems on financial text
analysis, with a particular focus on cross-lingual, application-oriented
tasks, and novel uses of NLP in finance. Tasks for non-English languages
and cross-domain applications are welcome.
*Proposal Criteria*
Your task proposal will be evaluated on:
- *Novelty:* Is the task addressing a unique or under-explored problem
in financial NLP?
- *Interest:* Will the task attract broad participation?
- *Data Quality:* Is the data collection plan robust, with high
inter-annotator agreement and appropriate licensing?
- *Evaluation:* Is the evaluation methodology rigorous, and will it
inspire future research?
- *Impact:* What long-term impact will this task have on financial NLP?
- *Ethics:* Data should avoid PII and adhere to ethical guidelines,
including privacy compliance and ethical data use.
*Task Organization*
Organizers should be prepared to:
- Ensure data quality and licensing, addressing ethical and security
concerns.
- Provide format checkers, baseline systems, and evaluation tools for
participants.
- Manage a competition platform (e.g., CodaLab) and maintain
communication channels.
- Write and present a task description paper at the FinEval session in
FinNLP workshop.
- Organize and review participant submissions and related documentation.
*Organizer Roles*
- *Lead Organizer:* Oversees the task, ensuring timely completion of
deliverables.
- *Co-Organizers:* Assist with data preparation, evaluation, and
participant communication.
- *Advisory Organizers:* Provide guidance, not necessarily engaged in
daily tasks.
Note: A minimum of two organizers is required per task. Single-organizer
submissions will not be accepted.
*Submission Guidelines*
Task proposals should be in PDF format, following the ACL Template
<https://github.com/acl-org/acl-style-files>, and must be no longer than 4
pages (plus references). Include the following sections:
- *Overview:* Summary, community interest, and anticipated impact.
- *Data & Resources:* Data sources, copyright details, data quantity,
quality assurance, and ethical considerations.
- *Pilot Task:* (recommended) Results and insights from initial studies.
- *Evaluation:* Clear evaluation methodology and criteria.
- *Task Reruns:* If a rerun, provide justification and expected impact.
- *Task Organizers:* Names, affiliations, contact details, and relevant
experience.
*Important Dates*
- *Task proposals due: *31 January 2025
- *Task selection notification: *20 February 2025
- *Sample data ready: *15 March 2025
- *Training data ready: *1 May 2025
- *Evaluation data ready: *1 June 2024
- *Evaluation start: *10 July 2025
- *Evaluation end: *31 July 2025
- *Paper submission: *31 Augest 2025
- *Notification to authors: *15 September 2025
- *Camera-ready papers due: *25 September 2025
- *FinNLP Workshop: *EMNLP-2025
-------- Original Message --------
Subject: Call for Participation: eRisk Lab @ CLEF 2025
Date: 2024-12-05 19:50
From: ACL Announcements <announcements(a)aclweb.org>
To: Announcements <announcements(a)aclweb.org>
Call for Participation: eRisk Lab @ CLEF 2025
Are you passionate about leveraging AI for societal good? Join us for
eRisk 2025, the ninth edition this lab at CLEF, where we delve into the
methodologies and applications of early risk detection on the Internet.
Our mission is to foster interdisciplinary research that addresses
critical health and safety challenges, from identifying signs of
depression to preventing online harm.
Tasks for eRisk 2025 (More info at https://erisk.irlab.org/ )
Task 1: Search for Symptoms of Depression
- Objective: Rank sentences from user writings by relevance to the 21
symptoms of the BDI-II questionnaire.
- Highlights:
- Use a TREC-formatted dataset with human-assessed relevance
judgments.
- Generate rankings for symptoms with evaluation via metrics like MAP
and nDCG.
- Create a valuable annotated corpus with broad applications beyond
this task.
- This is the third edition of the task: two years of training data.
Task 2: Contextualized Early Detection of Depression *(New in 2025)*
- Objective: Analyze full conversational contexts to detect early signs
of depression.
- Highlights:
- Evaluate sequential user interactions for a holistic view of
conversational dynamics.
- Train on isolated writings and test in real-world-like scenarios
with chronologically ordered conversations.
- Metrics include accuracy and timeliness, measured via ERDE and
similar frameworks.
- This is the first edition of the contextualized tasks: three year
of un-contextualized training data.
Pilot Task: Conversational Depression Detection via LLMs (New in 2025,
Interactive Task)
- Objective: Engage with LLM personas to identify depressive symptoms
based on conversational exchanges.
- Highlights:
- No training data provided—use creative and unsupervised approaches.
- Collaborate in a limited-message dialogue setting, simulating
real-world conditions.
- Push the boundaries of AI-human interaction for mental health
applications: are we able to accurately reproduce personas?
-This is a pilot task. Participants will need to book a slot to
interact with the LLM personas: register before the slots are gone!
Key Dates
- Dataset Release:
-T1: 1st December 2024 for training collections and test dataset
-T2: 1st December 2024 for training and 5th February 2025 for
beginning of test stage (server opens)
-T3: 5th February 2025 for beginning of test stage (server opens for
interacting with the LLM)
- Submission Deadlines:
-T1: 1st April 2025 for submitting participants’ results to FTP
-T2: 12th April 2025 end of test stage (server closes)
-T3: 12th April 2025 end of test stage (server closes)
- CLEF 2025 Conference: 9-12 September 2025, Madrid, Spain.
How to Participate
1. Register: Sign up through the [CLEF 2025 Labs Registration
site](https://clef2025-labs-registration.dei.unipd.it/)
2. Submit Agreements: Complete the user agreement form to access
datasets.
3. Join the Community: Join our Google Groups
https://groups.google.com/g/erisk-clef !
Lab co-chairs
Javier Parapar, Univ. A Coruña, Spain
Anxo Pérez, Univ. A Coruña, Spain
Xi Wang, Univ. Sheffield, United Kingdom
Fabio Crestani, Univ. Lugano, Switzerland
More Information
Visit the [eRisk website](https://erisk.irlab.org) for task details,
datasets, and registration guidelines.
Queen Mary University of London is currently advertising a Computational
Linguistics faculty position at the level of Lecturer (Assistant
Professor). The closing date is 5 January.
https://qmul-jobs.tal.net/vx/mobile-0/appcentre-ext/brand-4/candidate/so/pm…
This post is based in the Linguistics Department, in Humanities and Social
Sciences. Faculty in the department have a number of CL-adjacent interests
and collaborations. There is also a substantial Computational Linguistics
group in Computer Science, with whom the department has strong ties. The
appointed candidate will enhance our teaching at the interface of
Linguistics and CL/AI, for students who are interested in gaining more
computational or AI-linked skills.The position is a good fit for applicants
with a wide range of computational and AI-related interests, whether text
or speech, and who are interested in working with students with a range of
backgrounds and interests.
For further information please contact Prof Devyani Sharma <
d.sharma(a)qmul.ac.uk>
--
Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/
Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/
Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/
School of Electronic Engineering and Computer Science
Queen Mary University of London, London E1 4NS, UK
*My working days for QMUL are **Tuesday-Thursday**; responses to mail on
other days may be delayed.*
**** We apologize for the multiple copies of this email. In case you are
already registered to the next webinar, you do not need to register
again. ****
Dear colleague,
We are happy to announce the next webinar in the Language Technology
webinar series organized by the HiTZ Chair of AI< (https://hitz.eus).
You can check the videos of previous webinars and the schedule for
upcoming webinars here: http://www.hitz.eus/webinars
Next webinar:
Speaker: Javier de la Rosa - Artificial Intelligence Lab (National
Library of Norway)
Title: The Mímir Project: Impact of copyrighted materials in LLMs
Date: Thursday, December 12, 2024 - 15:00
Summary: The Mímir Project is an initiative by the Norwegian government
that aims to assess the significance and influence of copyrighted
materials in the development and performance of generative large
language models (LLMs) tailored to the Norwegian languages. This
collaborative effort involves three leading institutions from different
regions of the country: the National Library of Norway (NB), the
University of Oslo (UiO), and the Norwegian University of Science and
Technology (NTNU); each contributing unique expertise in language
technology, corpus curation, model training, copyright law, and
computational linguistics. The ultimate goal of the project was to
gather empirical evidence that informed the formulation of a
compensation scheme for authors whose works are utilized by these
advanced artificial intelligence (AI) systems, ensuring that
intellectual property rights are respected and adequately compensated.
Bio: Javier de la Rosa is a Research Scientist at the Artificial
Intelligence Lab at the National Library of Norway. A former
Postdoctoral Fellow in Natural Language Processing at UNED, he holds a
PhD in Hispanic Studies with a specialization in Digital Humanities by
the University of Western Ontario, and a Masters in Artificial
Intelligence by the University of Seville. Javier has previously worked
as a Research Engineer at the Stanford University, and as the Technical
Lead at the University of Western Ontario CulturePlex Lab. He is
interested in Natural Language Processing applied to historical and
literary text, with a special focus on large language models.
Upcoming webinars:
· Ekaterina Shutova (January 30, 2025)
· Sebastian Ruder (February 6, 2025)
· Christian Herff (Thursday, March 6, 2025)
If you are interested in participating, please complete this
registration form: http://www.hitz.eus/webinar_izenematea
If you cannot attend this seminar, but you want to be informed of the
following HiTZ webinars, please complete this registration form instead:
http://www.hitz.eus/webinar_info
Best wishes,
HiTZ Zentroa
P.S: HiTZ will not grant any type of certificate for attendance at these
webinars.
Reminder that the closing date for this position is *December 13th*:
A position as Postdoctoral Research Fellow in Natural Language Processing is available within MediaFutures:Research Centre for Responsible Media Technology & Innovation at the Language Technology Group (LTG) at the University of Oslo (UiO), Norway.
The closing date is December 13th, 2024.
For more information about the position and the research group, please see the full announcement here:
https://www.jobbnorge.no/en/available-jobs/job/270966/postdoctoral-research…
Please do not hesitate to contact me for any further information.
Best regards,
Lilja
============================================
Interspeech 2025
17 - 21 August, Rotterdam, The Netherlands
https://www.interspeech2025.org/
============================================
Call for Tutorials
https://www.interspeech2025.org/call-for-tutorials
============================================
Important Dates
===============
Proposals of tutorials due: 1 February 2025
Notification of selection to organizers: 5 April 2025
Final announcement of tutorials on the website: 20 April 2025
Tutorial Day: 17 August 2025
The Tutorial Day is an important component of INTERSPEECH. It offers a
unique opportunity for experts in various speech-related domains to
provide conference attendees with rich learning experiences. To ensure
a high-quality and diverse set of tutorials at INTERSPEECH 2025, we
invite proposals that cover both introductory and advanced topics,
from longstanding research challenges and current research trends to
emerging areas of study. These proposals should target early-stage
researchers and experienced researchers who wish to deepen their
knowledge in a new area. Each tutorial will be 3 hours long. The
tutorials are expected to provide an overview of an area of research
rather than focus on an individual presenter’s research program and
findings. While it is not mandatory to address the theme of
Interspeech 2025, "Fair and Inclusive Speech Science and Technology,”
we encourage proposals to consider how their tutorials might align
with or reflect this theme. We especially welcome proposals related to
the four strands of Interspeech 2025: Individual Differences in Speech
Processing, Under-Researched Languages, Dialects, and Accents,
Inclusive Technology for Atypical Speech Communication, and Ethical
Considerations. Proposals from individuals who identify as being
underrepresented in the speech science and technology community (due
to factors such as geographical location, economic status, race, age,
gender, sexual orientation, or any other characteristic) are
particularly welcome.
Proposals Should Include (in the following order)
• Title
• Presenter(s) name and affiliation
• Contact information (email, telephone)
• Abstract (no more than 200 words) summarizing the proposed
tutorial that could be used as an advertisement
• Description (1 – 2 pages; no more than 800 words), which
includes a few relevant references and any webpages/material useful
for reviewing the proposal
• Relevance of the proposed tutorial for Interspeech 2025 (0.5 – 1
page; no more than 400 words)
• Tutorial logistics, including
• Duration (1 session or 2 sessions; 3 hours = 1 session). If
1 session, please indicate your preference for morning or afternoon.
• Presenter(s) information (name(s))
• Special equipment required for the tutorial
• Description of accompanying material provided (handouts,
storage devices with media, etc.)
• Presenter information
• Biography of presenter(s)
• Key publications of presenter(s) on the tutorial topic
• List of previous tutorial experience
• Audience information
• Target audience (e.g. new researchers to the field, research
students, specialists of adjacent fields)
• Other considerations/comments
Submission Procedure
Proposals for the INTERSPEECH 2025 tutorials must be no more than 5
pages long and must conform to the format stated above; please ensure
that the headings listed above are identified clearly.
Proposals should be submitted by email to
tutorials(a)interspeech2025.org by Feb 1, 2025. Notifications of
selection will go out by April 5, 2025. By submitting a proposal, the
presenter(s) understand the ISCA policy of strongly encouraging video
recording of the tutorial for education purposes if the proposal is
accepted. Access to recording materials will be given through the ISCA
Video Archives.
Questions? Please contact our Tutorial Chairs at tutorials(a)interspeech2025.org
• Yiya Chen - Leiden University, The Netherlands
• Daan van Esch - Google (Amsterdam)
Apologies for cross-posting.
----------------------------------------
*The International Conference on Spoken Language Translation*
*ACL – 22nd IWSLT 2025 – First Call for Participation*
*31 July-1 August 2025 - Vienna, Austria*
http://iwslt.org
The International Conference on Spoken Language Translation (IWSLT)
<https://iwslt.org/> is the premier annual conference for all aspects of
Spoken Language Translation. Every year, the conference organises and
sponsors open evaluation campaigns around key challenges in simultaneous
and consecutive translation, under real-time/low latency or offline
conditions and under low-resource or multilingual constraints. System
descriptions and results from participants’ systems and scientific papers
related to key algorithmic advances and best practices are presented.
IWSLT is the venue of the SIGSLTs <https://iwslt.org/sigslt/>, the Special
Interest Group on Spoken Language Translation <https://iwslt.org/sigslt/>
of ACL <https://www.aclweb.org/portal/>, ISCA <https://www.isca-speech.org/>
and ELRA <https://www.elra.info/>. With a track record of 21 years, IWSLT
benchmarks and proceedings serve as reference for all researchers and
practitioners working on speech translation and related fields.
The 22nd edition of IWSLT will be run as a hybrid ELRA
<https://www.elra.info/>/ACL <https://www.aclweb.org/portal/> event,
co-located with ACL 2025 <https://2025.aclweb.org/> from 31 July to 1
August 2025.
*Important Dates*
*January 1, 2025*: Release of shared task training and dev data
*March 15, 2025*: Scientific paper submission deadline
*Apr 1-15, 2025*: Evaluation period
*April 21, 2025*: System description paper submission deadline
*May 15, 2025*: Notification of acceptance
*June 1, 2025*: Camera-ready deadline (all paper)
*July 31-Aug 1*, *2025*: IWSLT conference
Evaluation
The IWSLT 2025 features shared tasks <https://iwslt.org/2025/#shared-tasks>
that address the following focus areas:
- High-resource ST: Offline track, Simultaneous track, Subtitling track
- Low-resource ST: Low-resource and Indic (multilingual) tracks
- Instruction-following Speech Processing track: Technical domain ST, ASR,
Summarization, and QA
Training and development data for each shared task will be prepared and
released by the respective organisers (for further information on this
initiative, please refer to the IWSLT website <https://iwslt.org/2025/>).
Participants will receive instructions about how to submit their runs. In
addition, participants have the opportunity to present their work
through a system
paper that will be published in the ACL Proceedings.
Conference
IWSLT also invites submissions of scientific papers to be published in the
ACL Proceedings and presented either in oral or poster format. The
conference selects high-quality, original contributions on theoretical and
practical issues of spoken language translation research, technologies and
applications. Submissions will be accepted directly through the IWSLT
submission site (to be announced on the website <https://iwslt.org/2025/>).
We will also accept commitments of submissions with reviews from the ACL
Rolling Review.
Additionally, to foster cross-pollination of ideas, the conference also
invites the presentation of papers on speech translation recently published
elsewhere. Please note that this is for non-archival presentation of papers
relevant to speech translation already published in other venues (e.g.,
Findings for the *ACL, speech, NLP or MT conferences). Submissions for this
category will be accepted through a dedicated form (to be announced on the
website <https://iwslt.org/2025/>). Papers will be checked for relevance to
IWSLT, and assigned either oral or poster presentation slots if selected.
Contact
Please email iwslt-evaluation-campaign(a)googlegroups.com if you have any
questions related to the shared tasks.
Thanks,
Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul
(IWSLT organisers)