Dear colleagues,
We are pleased to announce that the submission deadline for our upcoming workshop:
The First Workshop on Natural Language Processing and Language Models for Digital Humanities (LM4DH 2025)
(co-located with RANLP 2025) has been extended to 27 July 2025!
This interdisciplinary workshop invites contributions at the intersection of computational methods and the humanities, including work on:
* Text analysis and genre detection
* Interpretability of LLM outputs
* Historical and low-resource language processing
* Dataset creation and curation
* Emotion analysis, authorship attribution, and more
We welcome standard (up to 8 pages) and short papers (4–6 pages). Submissions must follow the RANLP 2025 ACL-style template.
* New Submission Deadline: 27 July 2025
* Notification of Acceptance: 2 August 2025
* Camera-Ready Deadline: 20 August 2025
* Workshop Date: 11-13 September 2025 (To be confirmed)
Don’t miss the chance to be part of this exciting event that brings together researchers across linguistics, cultural heritage, NLP, history, and more. Submit your paper and join the conversation on the future of AI in the Digital Humanities!
For more information, visit https://www.clarin.eu/content/call-papers-first-workshop-natural-language-p…
Best regards,
CLARIN ERIC
/[Apologies for multiple postings]/
We are happy to announce that 2 new speech databases are available in
our catalogue.
*Chinese Kids Speech database (Lower Grade)
<https://catalog.elra.info/en-us/repository/browse/ELRA-S0496/>***
ISLRN: 369-011-475-593-5 <http://www.islrn.org/resources/369-011-475-593-5>
The Chinese Kids Speech database (Lower Grade) contains the total
recordings of 184 Chinese Kids speakers (98 males and 86 females), from
6 to 10 years' old, recorded in quiet rooms using smartphones. 1,426
sentences were used. Recordings were made through smartphones and audio
data stored in .wav files as sequences of 16KHz Mono, 16 bits, Linear PCM.
*Chinese Kids Speech database (Upper Grade)
<https://catalog.elra.info/en-us/repository/browse/ELRA-S0497/>***
ISLRN:993-024-988-227-0 <http://www.islrn.org/resources/993-024-988-227-0>
The Chinese Kids Speech database (Upper Grade) contains the total
recordings of 161 Chinese Kids speakers (71 males and 90 females), from
10 to 12 years’ old recorded in quiet rooms using smartphone.1,859
sentences were used. Recordings were made through smartphones and audio
data stored in .wav files as sequences of 16KHz Mono, 16 bits, Linear PCM.
For more information on the catalogue or if you would like to enquire
about having your resources distributed by ELRA, please *contact us*
<mailto:contact@elda.org>.
_________________________________________
Visit the *ELRA Catalogue of Language Resources* <http://catalog.elra.info>
*Archives *
<https://www.elra.info/catalogues/language-resources-announcements/>of
ELRA Language Resources Catalogue Updates
/Please note that you receive this email because you are or have been a
customer or a provider of ELRA Language Resources./
/ELRA Privacy Policy is available //here/
<https://www.elra.info/elra-privacy-policy/>
/If you do not want to receive such e-mails in the future, //contact us/
<mailto:privacy@elda.org>
Dear colleagues,
The July edition of the CLARIN Newsflash is out!
This summer, CLARIN is turning up the heat at major conferences and summer schools — from the Corpus Linguistics 2025 conference in Birmingham, to DH2025 in Lisbon, and the ESU summer school in Besançon, with ACL and Interspeech just around the corner. We’re proud to showcase our most popular and newly developed tools and resources, strengthening visibility, fostering meaningful connections, and collaborating with local CLARIN communities and researchers worldwide.
It’s all in this month’s newsflash — take a look!
https://www.clarin.eu/content/clarin-newsflash-july-2025
Wishing you a relaxing summer break — we’ll be back in September with a CLARIN2025 special edition!
CLARIN ERIC
First CFP: CHOMPS – Confabulation, Hallucinations, & Overgeneration in Multilingual & Precision-critical Settings
(with our apologies for cross-posting)
Venue: IJCNLP-AACL 2025 (https://2025.aaclnet.org/), Mumbai, India
Date: 23/24th December 2025 (TBC)
Workshop website: https://chomps2025.github.io/
* Description *
Despite rapid advances, LLMs continue to "make things up": a phenomenon that manifests as hallucination, confabulation, and overgeneration. That is, produce unsupported and unverifiable text that sounds deceptively plausible. These outputs pose real risks in settings where accuracy and accountability are non-negotiable, including healthcare, legal systems, and education. The aim of the CHOMPS workshop is to find ways to mitigate one of major the hurdles that currently prevent the adoption of Large Language Models in real-world scenarios: namely, their tendency to hallucinate, i.e., produce unsupported and unverifiable text that sounds deceptively plausible.
The workshop will explore hallucination mitigation in practical situations, where this mitigation is crucial: in particular, precision-critical applications (such as those in the medical, legal and biotech domains), as well as multilingual settings (given the lack of resources available to reproduce what can be done for English in other linguistic contexts). In practice, we intend to invite works of the following (not exclusive) list of topics:
* Workshop topics *
- Metrics, benchmarks and tools for hallucination detection
- Factuality challenges in mission critical & domain-specific (e.g., medical, legal, biotech) and their consequences
- Mitigation strategies during inference or model training
- Studies of hallucinatory and confabulatory behaviors of LLMS in cross-lingual and multilingual scenarios
- Confabulations in language & multimodal (vision, text, speech) models
- Perspectives and case studies from other disciplines
- …
* Invited speakers *
- Anna ROGERS, IT University of Copenhagen
- Danish PRUTHI, IISc Bangalore
- Abhilasha RAVICHANDER, University of Washington
* Submission details *
The workshop is designed with a widely inclusive submission policy so as to foster as vibrant a discussion as possible.
Archival or non-archival submissions may consist of up to 8 pages (long) or 4 pages (short) of content. Dissemination submissions may consist of up to 1 pages of content. On acceptance, authors may add one additional page to accommodate changes suggested by the reviewers.
Please use the ACL style templates available here: https://github.com/acl-org/acl-style-files
The submissions need to be done in PDF format via (a) via Direct submission (https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/CHOMPS) (b) via ARR commitment (https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/CHOMPS…)
* Important dates *
Paper submission deadline: September 29, 2025
Direct ARR commitment: October 27, 2025
Author notification: November 3, 2025
Camera-Ready due: November 11, 2025
Workshop date: December 23-24, 2025 (TBC)
* Contact *
For questions, please send an email to chomps-aacl2025(a)googlegroups.com or contact one of the workshop chairs:
- Aman Sinha, Université de Lorraine, aman.sinha(a)univ-lorraine.fr
- Raúl Vázquez, University of Helsinki, raul.vazquez(a)helsinki.fi
- Timothee Mickus, University of Helsinki, timothee.mickus(a)helsinki.fi
Call for Papers: CASE 2025 @ RANLP (8. Challenges and Applications of Automated Extraction of Socio-political Events from Texts)
Dear Colleagues,
We are pleased to announce the 8th edition of the Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, held in conjunction with RANLP 2025 (https://ranlp.org/ranlp2025/)!
CASE is a leading venue for research, resources, and practical advances in automated event extraction and analysis, focusing on social and political event data. It has been organized consistently in top venues like ACL, EMNLP, EACL, etc.
We invite submissions of research papers, resource papers, and position papers addressing (but not limited to) the following topics:
• Event extraction at the sentence, document, or cross-document level, including event coreference.
• Creation and annotation of datasets for event extraction.
• Modeling event-event relations such as subevents, causal, temporal, and spatial links.
• Evaluation of event datasets: reliability, validity, and coverage.
• Event schemas and ontologies: population, definition, and enrichment.
• Tools, pipelines, and infrastructure for event annotation and analysis.
• Linguistic aspects of event representation: lexical, syntactic, semantic, discursive, and pragmatic.
• Applications of event data in conflict prediction, early warning, and policy support.
• Detection of new event types, including protests, public health crises, and cyber activism.
• Bias, fairness, and misinformation in event extraction systems and datasets.
• Legal, ethical, and privacy considerations in dataset creation and dissemination.
• Cross-lingual, multilingual, and multimodal event extraction.
• Use of LLMs and generative AI for event extraction, analysis, and dataset generation.
• Release of new benchmarks, datasets, or annotation resources.
All accepted papers will be published in the ACL Anthology.
Website: https://emw.ku.edu.tr/case-2025/ (being updated! please get in touch with ahurriyetoglu(a)ku.edu.tr for any questions)
Link for submission: https://softconf.com/ranlp25/CASE2025/user/
Important dates:
Submission Deadline: 25 July 2025
Notification: August 17, 2025
Camera-ready deadline: August 30, 2025
Workshop date: September 11-13, 2025
Shared task
Multimodal detection of hate speech, humor, and stance in LGBTQ+ socio-political discourse
To know more and participate, please visit: https://github.com/therealthapa/case2025-multimodal/blob/main/README.md
All shared task papers will also be published in the ACL anthology.
Organizers: Surendrabikram Thapa, Siddhant Bikram Shah, Shuvam Shiwakoti, Kritesh Rauniyar, Surabhi Adhikari, Kristy Johnson, Ali Hürriyetoğlu, Hristo Tanev, Usman Naseem
Organizing committee:
Ali Hürriyetoglu
Hristo Tanev
Surendrabikram Thapa
Vanni Zavarella
Erdem Yörük
Hi, good morning
This is to share with you that our research group made publicly available the beta version of
*the chatbot for the Portuguese language based on open LLMs, Evaristo.ai <https://evaristo.ai/>
*
One of such LLMs is Gervásio 8B <https://huggingface.co/PORTULAN/gervasio-8b-portuguese-ptpt-decoder>, we developed for Portuguese and we are releasing also now,
and which is the LLM that you find active by default when arriving at this chatbot.
Though you may not be proficient in Portuguese, on-the-fly translators will likely help
you to get its content and its basic functioning. *We invite you, and your colleagues
and students, to visit it and try out this first test version. We welcome any help
you can give us in testing it, any feedback and any suggestions you can share with us.*
It has a unique set of features, among others: it is an open AI chatbot for
the Portuguese language; it is multi-model and multi-heteronym, as well as being agentic,
multi-tool and multi-modal; it does not track its users or pass on their content to third parties,
safeguarding user privacy and ownership of their content.
You'll find the presentation of its motivation in this press release <https://evaristo.ai/assets/pressRelease_EvaristoAI.pdf>(in English),
that can be complemented by the more complete description in the section about <https://evaristo.ai/about> (in Portuguese)
The current open LLMs available are typically between 10 and 100 times smaller than
the top-of-the-range closed LLMs used in commercial chatbots, so the costs associated
with training and operating them are much lower. The performance of open LLMs, however,
is much more satisfactory than this linear disproportion would suggest. They therefore have
an excellent ratio of performance quality versus cost, and are a viable option for
fully autonomous generative AI services focused on concrete use cases.
In this context, we see this chatbot as a milestone in the democratization of
generative technology for the Portuguese language, through open LLMs, by encouraging
more and more organizations to move forward with their own AI services,
rooted in their own computers, and focused on their concrete use cases.
Have a nice day,
António
Dear colleagues,
We are pleased to announce the last call for participation for 1st first Shared Task on Language Identification for Web Data at WMDQS/COLM 2025.
Important information:
🗓️ Registration Deadline: July 23 (AoE)
📍 Montréal, Canada
🌐 https://wmdqs.org/shared-task/
Registration:
To register, please submit a one-page document with a title, a list of authors, a list of provisional languages that you want to focus on, and a brief description of your approach. This document should be sent to wmdqs-pcs(a)googlegroups.com. You can change the list of languages or the system description during the shared task. This document's only purpose is to register your participation in the shared task. The shared task will run until the last week of September.
Motivation:
The lack of training data—especially high-quality data—is the root cause of poor language model performance for many languages. One obstacle to improving the quantity and quality of available text data is language identification (LangID or LID). LangID remains far from solved for many languages. Several of the commonly used LangID models were introduced in 2017 (e.g. fastText and CLD3). The aim of this shared task is to encourage innovation in open-source language identification and improve accuracy on a broad range of languages.
All participants will be invited to contribute a larger paper, which will be submitted to a high-impact NLP venue.
Description:
The main shared task is to submit LangID models that work well on a wide variety of languages on web data. We encourage participants to employ a range of approaches, including the development of new architectures and the curation of novel high-quality annotated datasets.
We recommend using the GlotLID corpus as a starting point for training data. Access to the data will be managed through the Hugging Face repository. Please note that this data should not be redistributed. We will use the same language label format as those used by GlotLID: an ISO 639-3 language code plus an ISO 15924 script code, separated by an underscore.
Although all systems will be evaluated on the full range of languages in our test set, we encourage submissions that focus on a particular language or set of languages, especially if those language(s) present particular challenges for language identification.
The shared task will take place in rounds. The first round will only include data from already existing datasets, subsequent rounds will include data annotated by the community as it is collected and processed. More languages will also be added in subsequent rounds.
Organizers:
For any questions, please drop a mail to wmdqs-pcs(a)googlegroups.com
Program Chairs:
Pedro Ortiz Suarez (Common Crawl Foundation)
Sarah Luger (MLCommons)
Laurie Burchell (Common Crawl Foundation)
Kenton Murray (Johns Hopkins University)
Catherine Arnett (EleutherAI)
Organizing Committee:
Thom Vaughan (Common Crawl Foundation)
Sara Hincapié (Factored)
Rafael Mosquera (MLCommons)
Dear colleagues,
My name is Alessandra Teresa Cignarella, I'm a postdoctoral researcher in the Language and Translation Technology Team (LT3) at Ghent University in Belgium. My research project is called RAINBOW [??] and I'm currently studying stereotypes about LGBTQIA+ people, particularly on social media, in online discourse, and in AI systems.
We have developed a brief questionnaire to gather diverse perspectives from those who experience or recognize these stereotypes. Your participation will support the creation of a multilingual dataset (Italian, Dutch, and Farsi) aimed at improving the inclusivity and reducing the harm caused by AI technologies toward queer communities. Whether you identify as LGBTQIA+, are an ally, or are interested in this research area, your input is highly valued.
Please find the questionnaire here:
*
ITALIAN: https://lnkd.in/dfPuyT6j
*
DUTCH: https://lnkd.in/d-3Di7WY
*
FARSI: https://lnkd.in/dfvWzWCu
Should you have any questions, please do not hesitate to contact me at: alessandrateresa.cignarella(a)ugent.be<mailto:alessandrateresa.cignarella@ugent.be>
I would greatly appreciate it if you could share this survey with your contacts who speak any of these three languages.
Thank you very much for your support!
Best regards,
Alessandra*
Alessandra Teresa Cignarella (she/her)
MSCA postdoctoral fellow
LT3, Language and Translation Technology Team
Department of Translation, Interpreting and Communication
Ghent University
[cid:8c809589-1d29-40cc-89a8-eea510c2a88f]
UCCTS 2025 - Call for Participation
The eighth edition of the UCCTS conference (www.uni-hildesheim.de/uccts2025) will be held on the 8-10th of September 2025 in Hildesheim, Germany.
UCCTS conference series are meant to bring together researchers who collect, annotate, analyze corpora and/or use them to inform contrastive linguistics and translation theory and/or develop corpus-informed tools (in foreign language teaching, language testing and quality assessment, translation pedagogy, computer-aided/machine translation or other related NLP domains). We invite original submissions that open to various topics within empirical contrastive linguistics and translation studies (see below). We welcome interdisciplinary contributions that combine corpus data with other types of empirical data (e.g. experiment) and allow for an interplay between different methods and data types. Moreover, we encourage contributions applying information and computational technologies including Large Language Models (LLMs).
Keynote speakers
*
Elke Teich, Saarland University in Germany *
Dylan Glynn, Université Paris 8, Vincennes - St Denis *
Christian Hardmeier, IT University of CopenhagenProgramme details: https://www.uni-hildesheim.de/fb3/institute-1/institut-fuer-uebersetzungswi…
Information on registration: https://www.uni-hildesheim.de/fb3/institute-1/institut-fuer-uebersetzungswi…
The UCCTS conference in Hildesheim precedes the annual conference series on computational linguistics KONVENS which will take place on 10-12the September in Hldesheim too.
Questions and inquiries under uccts2025(at)uni-hildesheim.de
***********************************
***** 2nd Call for Abstracts *****
***********************************
*** NARNiHS 2026
*** North American Research Network in Historical Sociolinguistics
*** Eighth Annual Meeting
*** 100% IN PERSON
*** Co-Located with the Linguistic Society of America (LSA) Annual Meeting
*** New Orleans, Louisiana USA
*** 8-11 January 2026
This event offers an opportunity for historical sociolinguistics scholars from all over the world to gather and share leading research. We encourage our fellow historical sociolinguists and scholars in related fields from our global scholarly community to **join us in New Orleans** for our Eighth Annual Meeting.
Consult this Call for Abstracts on the web: https://narnihs.org/?page_id=3135 .
--------------- Call for Abstracts ---------------.
Abstract submission online: https://easyabs.linguistlist.org/conference/NARNiHS_26/ .
Deadline: Friday, 15 August 2025, 11:59 PM US Eastern Time.
Late abstracts will not be considered.
The North American Research Network in Historical Sociolinguistics (NARNiHS) is accepting abstracts for its Eighth Annual Meeting in New Orleans, Thursday, January 8 -- Sunday, January 11, 2026. The 8th edition of this inclusive NARNiHS event seeks to provide a collaborative environment where presenters bring fully developed work for presentation and enrichment. We see the NARNiHS Annual Meeting as a place for showcasing excellent projects in historical sociolinguistics, seeking feedback from peers, and engaging in productive development of the field’s enduring questions.
NARNiHS welcomes papers in all areas of historical sociolinguistics, which is understood as the application and/or development of sociolinguistic theories, methods, and models for the study of historical language variation and change over time, or more broadly, the study of the interaction of language and society in historical periods and from historical perspectives. Thus, a wide range of linguistic areas, subdisciplines, methodologies, and adjacent disciplines easily find their place within historical sociolinguistics, and we encourage submission of abstracts that reflect this broad scope.
Abstracts will be accepted for both 20-minute papers and posters. Please note that, at the NARNiHS annual meeting, poster presentations are an integral part of the conference (not second-tier presentations). Abstracts will be assigned a paper or a poster presentation based on determinations in the review process about the most effective format for the submission. However, if you prefer that your submission be considered primarily for poster presentation, please specify this in your abstract.
Successful abstracts will demonstrate *thorough grounding* in historical sociolinguistics, *scientific rigor* in the formulation of research questions, and promise for rich discussion of ideas. Successful abstracts will be explicit about which *theoretical frameworks*, *methodological protocols*, and *analytical strategies* are being applied or critiqued. *Data sources and examples* should be sufficiently presented, so as to allow reviewers a full understanding of the scope and claims of the research. Please note that the *connection of your research to the field of historical sociolinguistics* should be explicitly outlined in your abstract. Failure to adhere to these criteria will likely result in rejection.
*** Abstract Format Guidelines***.
- Abstracts must be submitted in PDF format.
- Abstracts must fit on one 8.5x11 inch page, with margins no smaller than 1 inch and a font style and size no smaller than Times New Roman 12 point. You are encouraged to use the entire page, providing a full and robust description of the research. All additional supporting content (visualizations, trees, tables, figures, captions, examples, and references) must fit on a single (1) additional page. No exceptions to these requirements are allowed; abstracts longer than one page or with more than one additional page of supporting content will be rejected without review.
- Specify if you prefer your submission be considered primarily for a poster presentation.
- Anonymize your abstract. We realize that sometimes complete anonymity is not attainable, but there is a difference between the nature of the research creating an inability to anonymize and careless non-anonymizing (in citations, references, file names, etc.). Be sure to anonymize your PDF file (you may do so in Adobe Acrobat Reader by clicking on "File", then "Properties", removing your name if it appears in the "Author" line of the "Description" tab, and re-saving the file before submission). Do not use your name when saving your PDF (e.g. Smith_Abstract.pdf); file names will not be automatically anonymized by the EasyAbs system. Rather, use non-identifying information in your file name (e.g. HistSoc4Lyfe.pdf). Your name should only appear in the online form accompanying your abstract submission. Papers that are not sufficiently anonymized wherever possible will be rejected without review.
*** General Requirements ***.
- Abstracts must be submitted electronically using the following link: https://easyabs.linguistlist.org/conference/NARNiHS_26/ .
- Authors may submit a maximum of two abstracts: One single-author abstract and one co-authored abstract.
- Authors may not submit identical abstracts for presentation at the NARNiHS annual meeting and the LSA annual meeting or another LSA sister society meeting (ADS, ANS, NAHoLS, SCiL, SPCL, or SSILA).
- After submission, no changes of author, title, or wording of the abstract may occur. If your abstract is accepted, adjustment of typographical errors is permitted before a final version of the abstract is printed in the conference booklet.
- Papers and posters must be delivered as projected in the abstract or represent bona fide developments of the same research.
- Authors are expected to attend the conference in-person and present their own papers and posters. This will not be a hybrid event.
Contact us at NARNiHistSoc(a)gmail.com with any questions.