Dear colleagues,
EUSKORPORA, a newly created Linguistic Data Center for Basque digital technologies based in San Sebastián (Donostia), Spain, is seeking candidates for two key roles in its Technology area:
1) Senior AI and Language Technologies Specialist
2) Junior AI and Language Technologies Specialist
Both positions are part of the Center's mission to position the Basque language in the global digital space through open-source development and cutting-edge research.
=== SENIOR AI AND LANGUAGE TECHNOLOGIES SPECIALIST ===
EUSKORPORA, the Linguistic Data Center for Basque Digital Technologies, a new association based in Donostia/San Sebastián, is seeking a senior expert in AI technologies applied to natural language processing, with experience, to lead key tasks related to language technologies applied to the Basque language.
The selected person will be part of an interdisciplinary team and will participate in projects involving the collection, analysis, and annotation of linguistic data, as well as the development of open-source foundational language models (ASR, TTS, MT, NLP) oriented to Basque, in a research and development context closely connected to industry.
Responsibilities:
- Supervise and optimize processes for linguistic corpus collection, annotation, and management
- Lead the design and development of foundational language models applied to Basque (speech recognition, synthesis, translation, text processing, etc.)
- Contribute to the technological architecture of the Center
- Coordinate internal and external teams and mentor junior staff
- Identify innovation opportunities and contribute to proposals, reports, and dissemination
- Establish strategic relationships with ecosystem stakeholders
Requirements:
- Advanced degree (Master or PhD) in Computational Linguistics, NLP, AI, Computer Engineering, Data Science or related fields
- Minimum 5 years of experience in language or speech technologies
- Proven experience with ASR, TTS, MT, or NLP models
- Strong programming skills in Python and familiarity with frameworks such as Hugging Face, PyTorch, TensorFlow, spaCy, Kaldi, ESPnet, Fairseq
- Knowledge of MLOps, Git, and data science best practices
- Familiarity with open repositories and licensing
Languages:
- Basque: desirable, intermediate level (B2 or higher)
- Spanish: fluent
- English: high level (especially technical)
We offer:
- Participation in strategic national and international projects
- Competitive salary according to experience
- Interdisciplinary environment and opportunities for professional growth
=== JUNIOR AI AND LANGUAGE TECHNOLOGIES SPECIALIST ===
EUSKORPORA, the Linguistic Data Center for Basque Digital Technologies, a new association based in Donostia/San Sebastián, is seeking young professionals at the beginning of their careers to support key tasks related to the creation of linguistic resources and language technologies for the Basque language.
Selected individuals will join an interdisciplinary team and participate in projects involving the collection, annotation, and analysis of linguistic data, as well as the development of open-source foundational language models (ASR, TTS, MT, NLP) oriented to Basque, in a research and development context closely connected to industry.
Responsibilities:
- Support the collection, cleaning and annotation of linguistic corpora (text and audio)
- Assist in the training and evaluation of language and speech models
- Collaborate in the documentation and maintenance of language resources
- Contribute to the integration of open-source NLP tools and libraries
- Assist in reports and dissemination activities
- Work in coordination with technical, linguistic and project management profiles
Requirements:
- Degree or Master in Computational Linguistics, Computer Engineering, Data Science, or similar
- Basic knowledge of NLP, language models, or speech technologies
- Python programming (basic/intermediate level)
- Familiarity with linguistic annotation or text processing tools
- Experience with Git and frameworks like Hugging Face or spaCy is a plus
Languages:
- Basque: high level (B2 or higher)
- Spanish: fluent
- English: high level (B2 or higher)
We offer:
- Dynamic and innovative environment based in San Sebastián
- Continuous training in cutting-edge technologies
- Real opportunities for growth within the team
- Competitive salary according to training and experience
For further information or to apply, please contact:
info(a)euskorpora.eus
Best regards,
EUSKORPORA
[Euskorpora]<https://www.euskorpora.eus/>
Euskorpora
info(a)euskorpora.eus<mailto:sarregi@euskorpora.eus>
+(34) 611 02 81 72
Mezu elektroniko honetan jasotzen den informazioa hartzaileen erabilera pertsonal eta konfidentzialerako da. Okerreko mezu hau jaso baduzu, mesedez, jakinarazi eta ezabatu.
[https://www.euskorpora.eus/wp-content/uploads/2025/02/eco.png] Ez inprimatu mezu hau behar-beharrezkoa ez bada.
We are pleased to invite submissions for the first Interdisciplinary
Workshop on Observations of Misunderstood, Misguided and Malicious Use of
Language Models (OMMM 2025). The workshop will be held with the RANLP 2025
conference in Varna, Bulgaria, on 11-13 September 2025.
Overview
The use of Large Language Models (LLMs) pervades scientific practices in
multiple disciplines beyond the NLP/AI communities. Alongside benefits for
productivity and discovery, widespread use often entails misuse due to
misalignment of values, lack of knowledge, or, more rarely, malice. LLM
misuse has the potential to cause real harm in a variety of settings.
Through this workshop, we aim to gather researchers interested in
identifying and mitigating inappropriate and harmful uses of LLMs. These
include misunderstood usage (e.g., misrepresentation of LLMs in the
scientific literature); misguided usage (e.g., deployment of LLMs without
adequate training or privacy safeguards); and malicious usage (e.g.,
generation of misinformation and plagiarism). Sample topics are listed
below, but we welcome submissions on any domain related to the scope of the
workshop.
Important Dates
Submission deadline *[NEW]*: *15 July 2025*, at 23:59 Anywhere on Earth
Notification of acceptance: 01 August 2025
Camera-ready papers due: 30 August 2025
Workshop dates: September 11, 12, or 13, 2025
Submission Guidelines
Submissions will be accepted as short papers (4 pages) and as long papers
(8 pages), plus additional pages for references. All submissions undergo a
double-blind review, so they should not include any identifying
information. Submissions should conform to the RANLP guidelines; for
further information and templates, please see
https://ranlp.org/ranlp2025/index.php/submissions/
We welcome submissions from diverse disciplines, including NLP and AI,
psychology, HCI, and philosophy. We particularly encourage reports on
negative results that provide interesting perspectives on relevant topics.
In-person presenters will be prioritised when selecting submissions to be
presented at the workshop, but the workshop will take place in a hybrid
format. Accepted papers will be included in the workshop proceedings in the
ACL Anthology.
Papers should be submitted on the RANLP conference system at
https://softconf.com/ranlp25/OMMM2025/
Keynote Speaker
We are excited to have Dr. Stefania Druga as the keynote speaker for the
inaugural OMMM workshop. Dr. Druga is a Research Scientist at Google
DeepMind, where she designs novel multimodal AI applications.
Topics of Interest
We welcome paper submissions on all topics related to inappropriate and
harmful uses of LLMs, including but not limited to:
-
Misunderstood use (and how to improve understanding):
-
Misrepresentation of LLMs (e.g., anthropomorphic language)
-
Attribution of consciousness
-
Interpretability
-
Overreliance on LLMs
-
Misguided use (and how to find alternatives):
-
Underperformance and inappropriate applications
-
Structural limitations and ethical considerations
-
Deployment without proper training or safeguards
-
Malicious use (and how to mitigate it):
-
Adversarial attacks, jailbreaking
-
Detection and watermarking of machine-generated content
-
Generation of misinformation or plagiarism
-
Bias mitigation and trust design
For more information, please refer to the workshop website:
https://ommm-workshop.github.io/2025/. For any questions, please contact
the organisers at ommm-workshop(a)googlegroups.com.
The organisers,
Piotr Przybyła, Universitat Pompeu Fabra
Matthew Shardlow, Manchester Metropolitan University
Clara Colombatto, University of Waterloo
Nanna Inie, IT University of Copenhagen
[Apologies for cross-posting]
Terminology Translation Task at WMT2025 - Call for Participation
We are excited to announce the third Shared Task on Terminology Translation<https://www2.statmt.org/wmt25/terminology.html>, which would be run within the 10th Conference on Machine Translation (WMT2025) in Suzhou, China.
TL;DR:
- We test the sentence-level and document-level translation of the texts in finance and IT domains, given the explicit terminology.
- The language pairs are: English -> {Spanish, German, Russian, Chinese}, Chinese -> English.
- We evaluate the overall quality of translation, terminology success rate and consistency. Additionally, we compare the performance of systems given no terms provided, proper terminology and random terms.
- The task starts on 20th June 2025 AOE, the submission deadline is 20th July 2025 AOE.
- Please pre-register via Google Forms here: https://forms.gle/ZSn2pNJkQJAzHFnA6 .
OVERVIEW
The advances in neural MT and LLM-assisted translation of the last decade show nearly human quality in general domain translation at least for the high-resource languages. However, when it comes to specialized domains like science, finance, or legal texts, where the correct and consistent use of special terms is crucial, the task is far from being solved. The Terminology Shared Task aims to assess the extent to which machine translation models can utilize additional information regarding the translation of terminologies. Compared to two previous editions, 2021 and 2023, the new test data have more various test cases, are more consistent in domains for each translation direction, and are broader in language coverage.
TASK DESCRIPTION
Track №1: Sentence/Paragraph-Level Translation
You will be provided with sequence of input sentences long, and small terminology dictionaries that will correspond only to the terms present in the given sentence.
Language Pairs:
* en-de (English → German)
* en-ru (English → Russian)
* en-es (English → Spanish)
Domain: information technology
Track №2: Document-Level Translation
The setup is similar to Track №1, with two exceptions: the length of the input texts now equals the document, and the dictionaries correspond to the whole set of input texts (i.e. they are corpus-level). This makes the task close to the real-life setup (where the dictionaries exist independently from the texts), while it may complicate the implementation (since for the solutions that require storing the whole dictionary it will take more memory). Additionally, for the whole document setup, the problem of the consistent usage of terms is becoming more important.
Language Pairs:
en-zh-Hant (English → Traditional Chinese)
zh-Hant-en (Traditional Chinese → English)
Domain: finance
EVALUATION
Terminology Modes:
You are expected to compare your system’s performance under three modes:
1. No terminology: the system is only provided with input sentences/documents.
2. Proper terminology: the system is provided with input texts (same as 1.) and dictionaries of the format {source_term: target_term}.
3. Random terminology: the system is provided with input texts and translation dictionaries of the same format as in 2. The difference is that the dictionary items are not special terms but words randomly drawn from input texts. This mode is of special interest since we want to measure to what extent the proper term translations help to improve the system performance (2.), as opposed to an arbitrary broader input that does not contain the domain-specific terminology.
Metrics:
1. Overall Translation Quality: we will evaluate the general aspects of machine translation outputs such as fluency, adequacy and grammaticality. We will do that with the general MT automatic metrics such as BLEU or COMET. In addition to that, we will pay special attention to the grammaticality of the translated terms.
2. Terminology Success Rate: This metric assesses the ability of the system to accurately translate technical terms given the specialized vocabulary. This will be carried out by comparing the occurrences of the correct term translations (i.e. the ones present in the dictionary) to the output terms. The goal is to have a higher success rate that will show adherence to dictionary translations.
3. Terminology Consistency: for domains such as science or legal texts, the consistent use of an introduced term throughout the text is crucial. In other words, we want a system to not only pick up a correct term in a target language but to use it consistently once it is chosen. This will be evaluated by comparing all translations of a given source term in a text and measuring the percentage of deviations from the most consistent translation. This metric is more important for the Document-Level track, but it will be used for both tracks.
IMPORTANT DATES
All dates are end of Anywhere on Earth (AoE).
Data snippets released: 7th May 2025
Dev data released: 22nd May 2025
Test data release, task starts: 20th June 2025 (postponed)
Submission deadline: 20th July 2025 (postponed)
Paper submission to WMT25: in-line with WMT25
Camera-ready submission to WMT25: in-line with WMT25
Conference in Suzhou, China: 05-09 November 2025
SUBMISSION GUIDELINES
0. Please notify us about your participation prior to submission. This is optional, but will be very helpful for us for better understanding of our workload after submission. Please do it through this Google Form: https://forms.gle/ZSn2pNJkQJAzHFnA6
1. Check your submission files with the validation script. It will be published at test date publication.
2. Write a description of your system (optional).
3. Submit your system via Google Forms. The Google form with all necessary submission details will be published at the test set date.
All details on submission as well as FAQ can be found at the webpage of the shared task.
ORGANIZERS
* Kirill Semenov (University of Zurich), main contact: FirstNаmе [dоt] LаstNаmе {аt} uzh /dоt/ ch
* Nathaniel Berger (Heidelberg University)
* Pinzhen Chen (University of Edinburgh & Aveni.ai)
* Xu Huang (Nanjing University)
* Arturo Oncevay (JP Morgan)
* Dawei Zhu (Amazon)
* Vilém Zouhar (ETH Zurich)
WEBSITE: https://www2.statmt.org/wmt25/terminology.html
In case of query, please send an email to Kirill Semenov (see email above).
Call for papers: The First Workshop on Natural Language Processing and Language Models for Digital Humanities
(LM4DH_2025) @ RANLP_2025
Date: 11th- to 13th September 2025 (TBC)
Venue : Varna, Bulgaria
Website: https://www.clarin.eu/event/2025/clarin-workshop-ranlp-2025
Submissions Portal: https://softconf.com/ranlp25/LM4DH2025/
Digital Humanities has emerged as an interdisciplinary field of research, serving as an intersection of computer science with many other fields such as linguistics, social sciences, history, psychology, etc. With the development of Large Language Models (LLMs), state-of-the-art Natural Language Processing (NLP) tasks such as entity recognition, sentiment analysis, and text summarisation have been significantly enhanced, offering powerful tools to analyse and interpret complex historical and cultural data. These developments offer transformative capabilities for analysing and interpreting complex historical and cultural datasets, including oral histories, archival documents, and literary texts. These advancements provide powerful tools for analysing and interpreting intricate historical, cultural, and social data, enabling researchers to identify patterns, extract meaningful relationships, and generate interpretations at unprecedented scale and precision.
This workshop aims to provide a common platform for researchers, practitioners, and students from diverse disciplines to collaboratively explore and apply AI-driven techniques in the Digital Humanities. Through interdisciplinary discussion, the event aims to generate creative approaches, exchange best practices, and create a community committed to furthering AI-based research on human culture and history. The focus of the workshop is on applying natural language processing techniques to digital humanities research. The topics can be anything of digital humanities interest with a natural language processing or LLM-based application. We expect contributions related (but not limited) to the following topics:
* Text analysis and processing related to the humanities using computational methods
* Usage of the interpretability of large language models' output for DH-related tasks
* Dataset creation and curation for NLP (e.g. digitisation, datafication, and data preservation
* Automatic error detection, correction, and normalisation of textual data
* Generation and analysis of literary works such as poetry and novels
* Analysis and detection of text genres
* Emotion analysis for the humanities and literature
* Modelling of information and knowledge in the Humanities, Social Sciences, and Cultural Heritage
* Low-resource and historical language processing
* Search for scientific and/or scholarly literature
* Profiling and authorship attribution
Submission & Publication
All papers must represent original and unpublished work that is not currently under review. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop.
Submissions must follow the RANLP 2025 submission guidelines<https://ranlp.org/ranlp2025/index.php/submissions/>, using ACL-style templates (LaTeX or MS Word).
Paper must be submitted using SoftConf at https://softconf.com/ranlp25/LM4DH2025/
All papers will be double-blind peer reviewed. Authors of the accepted papers will present their work in either the oral or poster session. All accepted papers will appear in the workshop proceedings that will be published in ACL Anthology.
Important Dates
* Paper submission deadline: 20th July 2025
* Notification of acceptance: 2nd August 2025
* Camera-ready paper: 20th August 2025
* Workshop date: 11th September 2025
Organising Committee
* Isuri Anuradha, Lancaster University, UK
* Francesca Frontini, CNR-ILC, Italy & CLARIN ERIC
* Paul Rayson, Lancaster University, UK
* Ruslan Mitkov, Lancaster University, UK
* Deshan Sumanathilake, Swansea University, UK
This workshop has been organised with the generous support and coordination of CLARIN-EU.
Gmail: dhranlp2(a)gmail.com<mailto:%20dhranlp2@gmail.com>
*Call for Participation in Tracks
*
*FIRE 2025: 17th meeting of the Forum for Information Retrieval Evaluation*
Indian Institute of Technology (BHU) Varanasi
17th - 20th December
Website: fire.irsi.org.in <http://fire.irsi.org.in/>
*Call for Participation in Tracks*
FIRE 2025 offers the following exciting tracks this year:
* Cross-Lingual Mathematical Information Retrieval (CLMIR)
<https://clmir2025.github.io/>
* Code-Mixed Information Retrieval from Social Media Data (CMIR)
<https://cmir-iitbhu.github.io/cmir/index.html>
* Hate Speech and Offensive Content Identification in Memes in
Bengali, Hindi, Gujarati and Bodo (HASOC-meme)
<https://hasocfire.github.io/hasoc/2025/>
* Information Retrieval in Software Engineering (IRSE)
<https://sites.google.com/view/irse-2025/home>
* Misinformation Detection and Prompt Recovery (PROMID)
<https://promid.github.io/index.html>
* Multilingual Story Illustration: Bridging Cultures through AI
Artistry (MUSIA) <https://cse-iitbhu.github.io/MUSIA/index.html>
* Offensive Language Identification in Dravidian Languages
(DravidianCodeMix)
<https://dravidian-codemix.github.io/2025/dataset.html>
* Opinion Extraction and Question Answering from
CryptoCurrency-Related Tweets and Reddit posts (CryptOQA)
<https://sites.google.com/view/cryptoqa-2025/>
* Research Highlight Generation from Scientific Papers (SciHigh)
<https://sites.google.com/jadavpuruniversity.in/scihigh2025/home>
* Spoken-Query Cross-Lingual Information Retrieval for the Indic
Languages (SqCLIR) <https://sites.google.com/view/sqclir-2025>
* Varanasi Tourism in Question Answer System (VATIKA)
<https://sites.google.com/view/vatika-2025/>
* Word-Level Identification of Languages in Dravidian Languages (WILD)
<https://www.codabench.org/competitions/7902/>
Research groups are invited to participate in the experiments. Please
register directly with the organizers.
FIRE 2025 is the 17th edition of the annual meeting of Forum for
Information Retrieval Evaluation (fire.irsi.org.in). Since its inception
in 2008, FIRE had a strong focus on shared tasks similar to those
offered at Evaluation forums like TREC, CLEF, and NTCIR. The shared
tasks focus on solving specific problems in the area information access
and, more importantly help in generating evaluation datasets for the
research community.
Visit fire.irsi.org.in <http://fire.irsi.org.in>
The 2st Workshop on DHOW: Diffusion of Harmful Content on Online Web
Workshop
The workshop will be conducted in a *hybrid* format to ensure maximum
participation, accommodating attendees both *online* and in person.
Submission deadline: *July 11 2025 AOE*
*Workshop site*: https://dhow-workshop.github.io/2025/
*Co-located with ACMMM 2025*
https://acmmm2025.org/ <https://lrec-coling-2024.org/>
Dublin, Ireland, 27-31 October 2024
*Important Dates*
Submission deadline: extended to *July 11, 2025*
Notification of acceptance: August 01, 2025
Camera-ready papers due: August 11, 2025
Workshop date: October 27/28, 2025
*Workshop Description*
With the advancement of digital technologies and gadgets, online content
is easily accessible. At the same time, harmful content also gets
spread. There are different harmful content available on different
platforms in multiple languages. The topic of harmful content is broad
and covers multiple research directions. But from the user’s aspect,
they are affected by them all. Often, it is studied individually, like
misinformation and hate speech. Research has been done on one platform,
monolingual, on a particular issue. It leads to harmful content
spreaders switching platforms and languages to reach the user base.
Harmful is not limited to social media but also news media. Spreader
shares harmful content in posts, news articles, comments, and
hyperlinks. So, there is a need to study the harmful content by
combining cross-platform, language, multimodal data and topics.
We will bring the research on harmful content under one umbrella so that
research on different topics (hate speech, misinformation,
disinformation, self-harm, offensive content, etc.) can bring some novel
methods and recommendations for users, leveraging text analysis with
image, audio, and video recognition to detect harmful content in diverse
formats. The workshop will cover the ongoing issue of war or elections
in 2025.
We believe this workshop will provide a unique opportunity for
researchers and practitioners to exchange ideas, share latest
developments, and collaborate on addressing the challenges associated
with harmful contents spread across the Web. We expect that the workshop
will generate insights and discussions that will help advance the field
of societal artificial intelligence (AI) for the development of safer
internet. In addition to attracting high quality research contributions
to the workshop, one of the aims of the workshop is to mobilise the
researchers working on the related areas to form a community.
*Submissions Topics*
•Studying different types of harmful content
•Computational fact-checking & Misinformation Detection
•Role of Generative AI in Mitigating Harmful Content
•Harassment, Bullying, and Hate Speech Detection
•Explainable AI for Harmful Content Analysis
•Multimodal and Multilingual Harmful Content Detection such as fake
news, spam, and troll detection.
•Deepfake and Synthetic Media
•Ethical & Societal Implications of AI in Content Moderation
•Both Qualitative and Quantitative study on harmful content
•Psychological effects of harmful content like mental health
•Approaches for data collection or data annotation using multimodal
large models on harmful content
•User study on the effects of harmful content on human beings
*Submissions*
- Submission Instructions: https://dhow-workshop.github.io/2025/#call
<https://dhow-workshop.github.io/2025/#call>
- Submission Link:
https://openreview.net/group?id=acmmm.org/ACMMM/2025/Workshop/DHOW
<https://openreview.net/group?id=acmmm.org/ACMMM/2025/Workshop/DHOW>
***Workshop organizers*
•Thomas Mandl (University of Hildesheim, Germany)
•Haiming Liu (University of Southampton, United Kingdom)
•Gautam Kishore Shahi(University of Duisburg-Essen, Germany)
•Amit Kumar Jaiswal (University of Surrey, United Kingdom )
•Durgesh Nandini (University of Bayreuth, Germany)
DHOW 2025
Ethical LLMs 2025: The first Workshop on Ethical Concerns in Training, Evaluating and Deploying Large Language Models<https://sites.google.com/view/ethical-llms-2025> @ RANLP2025<https://ranlp.org/ranlp2025/>
Call for papers:
Scope
Large Language Models (LLMs) represent a transformative leap in Artificial Intelligence (AI), delivering remarkable language-processing capabilities that are reshaping how we interact with technology in our daily lives. With their ability to perform tasks such as summarisation, translation, classification, and text generation, LLMs have demonstrated unparalleled versatility and power. Drawing from vast and diverse knowledge bases, these models hold the potential to revolutionise a wide range of fields, including education, media, law, psychology, and beyond. From assisting educators in creating personalised learning experiences to enabling legal professionals to draft documents or supporting mental health practitioners with preliminary assessments, the applications of LLMs are both expansive and profound.
However, alongside their impressive strengths, LLMs also face significant limitations that raise critical ethical questions. Unlike humans, these models lack essential qualities such as emotional intelligence, contextual empathy, and nuanced ethical reasoning. While they can generate coherent and contextually relevant responses, they do not possess the ability to fully understand the emotional or moral implications of their outputs. This gap becomes particularly concerning when LLMs are deployed in sensitive domains where human values, cultural nuances, and ethical considerations are paramount. For example, biases embedded in training data can lead to unfair or discriminatory outcomes, while the absence of ethical reasoning may result in outputs that inadvertently harm individuals or communities. These limitations highlight the urgent need for robust research in Natural Language Processing (NLP) to address the ethical dimensions of LLMs. Advancements in NLP research are crucial for developing methods to detect and mitigate biases, enhance transparency in model decision-making, and incorporate ethical frameworks that align with human values. By prioritising ethics in NLP research, we can better understand the societal implications of LLMs and ensure their development and deployment are guided by principles of fairness, accountability, and respect for human dignity. This workshop will dive into these pressing issues, fostering a collaborative effort to shape the future of LLMs as tools that not only excel in technical performance but also uphold the highest ethical standards.
Submission Guidelines
We follow the RANLP 2025 standards for submission format and guidelines. EthicalLLMs 2025 invites the submission of long papers, up to eight pages in length, and short papers, up to six pages in length. These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references) papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix.
To prepare your submission, please make sure to use the RANLP 2025 style files available here:
* Latex<https://ranlp.org/ranlp2025/wp-content/uploads/2025/05/ranlp2025-LaTeX.zip>
* Word<https://ranlp.org/ranlp2025/wp-content/uploads/2025/05/ranlp2025-word.docx>
Papers should be submitted through Softconf/START using the following link: https://softconf.com/ranlp25/EthicalLLMs2025/
Topics of interest
The workshop invites submissions on a broad range of topics related to the ethical development and evaluation of LLMs, including but not limited to the following.
1.
Bias Detection and Mitigation in LLMs
Research focused on identifying, measuring, and reducing social, cultural, and algorithmic biases in large language models.
2.
Ethical Frameworks for LLM Deployment
Approaches to integrating ethical principles—such as fairness, accountability, and transparency—into the development and use of LLMs.
3.
LLMs in Sensitive Domains: Risks and Safeguards
Case studies or methodologies for deploying LLMs in high-stakes fields such as healthcare, law, and education, with an emphasis on ethical implications.
4.
Explainability and Transparency in LLM Decision-Making
Techniques and tools for improving the interpretability of LLM outputs and understanding model reasoning.
5.
Cultural and Contextual Understanding in NLP Systems
Strategies for enhancing LLMs’ sensitivity to cultural, linguistic, and social nuances in global and multilingual contexts.
6.
Human-in-the-Loop Approaches for Ethical Oversight
Collaborative models that involve human expertise in guiding, correcting, or auditing LLM behaviour to ensure responsible use.
7. Mental Health and Emotional AI: Limits of LLM Empathy
Discussions on the role of LLMs in mental health support, highlighting the boundary between assistive technology and the need for human empathy.
Organisers
Damith Premasiri – Lancaster University, UK
Tharindu Ranasinghe – Lancaster University, UK
Hansi Hettiarachchi – Lancaster University, UK
Contact
If you have any questions regarding the workshop, please contact Damith: d.dolamullage(a)lancaster.ac.uk
Dear all,
We are currently doing a project aiming to make querying in syntactically annotated corpora easier and more accessible.
For this purpose, we want to know what researchers are actually searching for.
If you have a minute of your time, please feel free to fill out this form.
https://forms.office.com/e/a8DgETSabB
Feel free to reach out to ekavol(a)chalmers.se or nikdew(a)chalmers.se if you have any further questions.
Best regards
Niklas Deworetzki & Katja Voloshina
PhD Students
Department of Computer Science and Engineering
Chalmers University of Technology | University of Gothenburg
SE-412 96 Göteborg, Sweden
www.gu.se<http://www.gu.se/>
www.chalmers.se<http://www.chalmers.se/>
[cid:a8138665-78e4-4530-80d5-cf9cbf2bd3c2]
CLEF 2025 – Registration Open
Conference and Labs of the Evaluation Forum
We are pleased to announce CLEF 2025, taking place 9–12 September 2025 in Madrid, Spain at UNED. This peer‑reviewed conference and associated labs foster research in multilingual, multimodal, and cross‑language information access https://clef2025.clef-initiative.eu/.
Register now – Early‑bird registration is open! Standard registration opened earlier this year, and early-bird rates are currently available .
Why attend?
*
Present and discuss original research at main conference.
*
Engage in innovative labs and challenges, including LifeCLEF, ImageCLEF, EXIST, eRisk, CheckThat!, and more https://clef2025.clef-initiative.eu/index.php?page=Pages/labs.html.
*
Benefit from rich networking with academic and industry experts in IR, NLP, multimedia retrieval, and evaluation sciences.
For detailed conference and lab registration, registration deadlines, and pricing, please visit the official site: https://clef2025.clef-initiative.eu/index.php?page=Pages/registrationConfer…
Important Dates
*
Early‑bird registration ongoing
*
Registration closes: 31 August 2025
*
Conference & labs: 9–12 September 2025 — Madrid, Spain
We look forward to welcoming participants from across the global community — see you this September in Madrid at CLEF 2025!
Jorge Carrillo-de-Albornoz
On behalf of the CLEF 2025 Organising Committee
AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente.
Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datos<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://sede.uned.es/> de la Universidad.
Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.
Apologies for cross-posting.
---------------------------------------------------------------------------
*CALL FOR PAPERS: Language Resources and Evaluation Journal- Special Issue
on Machine Translation for Low-Resource Languages*
https://link.springer.com/collections/gbdgacbgbg
*Guest Editors:*
- Atul Kr. Ojha (Insight Research Ireland Centre for Data Analytics,
DSI, University of Galway, Ireland)
- Chao-Hong Liu (Industrial Technology Research Institute, Potamu
Research Ltd.)
- Ekaterina Vylomova (University of Melbourne, Australia)
- Flammie Pirinen (UiT The Arctic University of Norway, Tromsø)
- Jonathan Washington (Swarthmore College, USA)
- Nathaniel Oco (De La Salle University, Philippines)
- Xiaobing Zhao (Minzu University of China)
Machine translation (MT) technologies have been improved significantly in
the last decade using neural MT (NMT) approaches. However, most of these
methods rely on the availability of large parallel data for training the MT
systems, resources which are not available for the majority of language
pairs. Hence, current technologies often fall short in their ability to be
applied to low-resource languages. Developing MT technologies using
relatively small corpora still presents a major challenge for the MT
community. In addition, many methods for developing MT systems still rely
on several natural language processing (NLP) tools to pre-process texts in
source languages and post-process MT outputs in target languages. The
performance of these tools often has a great impact on the quality of the
resulting translation. The availability of MT technologies and NLP tools
can facilitate equal access to information for the speakers of a language
and determine on which side of the digital divide they will end up. The
lack of these technologies for many of the world's languages provides
opportunities both for the field to grow and for making tools available for
speakers of low-resource languages.
In the past few years, several workshops and evaluations have been
organized to promote research on low-resource languages. NIST has been
conducting Low Resource Human Language Technology evaluations (LoReHLT)
annually from 2016 to 2019. In LoReHLT evaluations, there is no training
data in the evaluation language. Participants receive training data in
related languages but need to bootstrap systems in the surprise evaluation
language at the start of the evaluation. Methods for this include pivoting
approaches and taking advantage of linguistic universals. The evaluations
are supported by DARPA's Low Resource Languages for Emergent Incidents
(LORELEI) program, which seeks to advance technologies that are less
dependent on large data resources and that can be quickly pivoted to new
languages within a very short amount of time so that information from any
language can be extracted in a timely manner to provide situation awareness
to emergent incidents. There are also the Workshop on Technologies for MT
of Low-Resource Languages (LoResMT), Special Interest Group on
Under-resourced Languages (SIGUL), Workshop on Resources and Technologies
for Indigenous, Endangered and Lesser-resourced Languages in Eurasia
(EURALI), the Workshop on Deep Learning Approaches for Low-Resource Natural
Language Processing (DeepLo). AfricaNLP, TurkLang, Conference on Machine
Translation (WMT), and International Conference on Spoken Language
Translation (IWSLT) workshop, which provide a venue for sharing research
and working on research and development in this field.
This topical collection solicits original research papers on MT
systems/methods and related NLP tools for low-resource languages in
general. LoReHLT, LORELEI, LoResMT, SIGUL, EURALI, DeepLo, WMT, and IWSLT
participants are very welcome to submit their work to the special issue.
Summary papers on MT research for specific low-resource languages, as well
as extended versions (>40% difference) of published papers from relevant
conferences/workshops, are also welcome.
Topics of the special issue include, but are not limited to:
* Research and review papers on MT systems/methods for low-resource
languages
* Research and review papers on pre-processing and/or post-processing NLP
tools for MT
* Word tokenizers/de-tokenizers for low-resource languages
* Word/morpheme segmenters for low-resource languages
* Use of morphological analyzers and/or morpheme segmenters in MT
* Multilingual/cross-lingual NLP tools for MT
* Review of available corpora of low-resource languages for MT
* Pivot MT for low-resource languages
* Zero-shot MT for low-resource languages
* Fast building of MT systems for low-resource languages
* Re-usability of existing MT systems and/or NLP tools for low-resource
languages
* Machine translation for language preservation
* Techniques that work across many languages and modalities
* Techniques that are less dependent on large data resources
* Use of language-universal resources
* Bootstrap-trained resources for the short development cycle
* Entity, relation- and event-extraction
* Sentiment detection in MT
* MT Summarisation
* Processing diverse languages, genres (news, social media, etc.) and
modalities (text, speech, video, etc.)
* Speech Translation for low-resource languages
* Multimodal MT for low-resource languages
* MT models using LLMs for low-resource languages
* Generative AI models for low-resource languages
* Evaluation metrics and datasets for low-resource languages
For further information on this initiative, please refer to
https://link.springer.com/collections/gbdgacbgbg
*IMPORTANT DATES*
*August 26, 2025: Paper submission deadlineDecember 05, 2025: Revised
papers dueMarch 2026: Publication*
* SUBMISSION GUIDELINES*
Authors should follow the "Instructions for Authors
<https://link.springer.com/journal/10579/submission-guidelines> (
https://link.springer.com/journal/10579/submission-guidelines or Overleaf
<https://link.springer.com/journal/10579/updates/17234296>)" on the LRE
journal website <https://link.springer.com/journal/10579>.
Thanks,