Ethical LLMs 2025: The first Workshop on Ethical Concerns in Training, Evaluating and Deploying Large Language Models<https://sites.google.com/view/ethical-llms-2025> @ RANLP2025<https://ranlp.org/ranlp2025/>
Call for papers:
Scope
Large Language Models (LLMs) represent a transformative leap in Artificial Intelligence (AI), delivering remarkable language-processing capabilities that are reshaping how we interact with technology in our daily lives. With their ability to perform tasks such as summarisation, translation, classification, and text generation, LLMs have demonstrated unparalleled versatility and power. Drawing from vast and diverse knowledge bases, these models hold the potential to revolutionise a wide range of fields, including education, media, law, psychology, and beyond. From assisting educators in creating personalised learning experiences to enabling legal professionals to draft documents or supporting mental health practitioners with preliminary assessments, the applications of LLMs are both expansive and profound.
However, alongside their impressive strengths, LLMs also face significant limitations that raise critical ethical questions. Unlike humans, these models lack essential qualities such as emotional intelligence, contextual empathy, and nuanced ethical reasoning. While they can generate coherent and contextually relevant responses, they do not possess the ability to fully understand the emotional or moral implications of their outputs. This gap becomes particularly concerning when LLMs are deployed in sensitive domains where human values, cultural nuances, and ethical considerations are paramount. For example, biases embedded in training data can lead to unfair or discriminatory outcomes, while the absence of ethical reasoning may result in outputs that inadvertently harm individuals or communities. These limitations highlight the urgent need for robust research in Natural Language Processing (NLP) to address the ethical dimensions of LLMs. Advancements in NLP research are crucial for developing methods to detect and mitigate biases, enhance transparency in model decision-making, and incorporate ethical frameworks that align with human values. By prioritising ethics in NLP research, we can better understand the societal implications of LLMs and ensure their development and deployment are guided by principles of fairness, accountability, and respect for human dignity. This workshop will dive into these pressing issues, fostering a collaborative effort to shape the future of LLMs as tools that not only excel in technical performance but also uphold the highest ethical standards.
Submission Guidelines
We follow the RANLP 2025 standards for submission format and guidelines. EthicalLLMs 2025 invites the submission of long papers, up to eight pages in length, and short papers, up to six pages in length. These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references) papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix.
To prepare your submission, please make sure to use the RANLP 2025 style files available here:
* Latex<https://ranlp.org/ranlp2025/wp-content/uploads/2025/05/ranlp2025-LaTeX.zip>
* Word<https://ranlp.org/ranlp2025/wp-content/uploads/2025/05/ranlp2025-word.docx>
Papers should be submitted through Softconf/START using the following link: https://softconf.com/ranlp25/EthicalLLMs2025/
Topics of interest
The workshop invites submissions on a broad range of topics related to the ethical development and evaluation of LLMs, including but not limited to the following.
1.
Bias Detection and Mitigation in LLMs
Research focused on identifying, measuring, and reducing social, cultural, and algorithmic biases in large language models.
2.
Ethical Frameworks for LLM Deployment
Approaches to integrating ethical principles—such as fairness, accountability, and transparency—into the development and use of LLMs.
3.
LLMs in Sensitive Domains: Risks and Safeguards
Case studies or methodologies for deploying LLMs in high-stakes fields such as healthcare, law, and education, with an emphasis on ethical implications.
4.
Explainability and Transparency in LLM Decision-Making
Techniques and tools for improving the interpretability of LLM outputs and understanding model reasoning.
5.
Cultural and Contextual Understanding in NLP Systems
Strategies for enhancing LLMs’ sensitivity to cultural, linguistic, and social nuances in global and multilingual contexts.
6.
Human-in-the-Loop Approaches for Ethical Oversight
Collaborative models that involve human expertise in guiding, correcting, or auditing LLM behaviour to ensure responsible use.
7. Mental Health and Emotional AI: Limits of LLM Empathy
Discussions on the role of LLMs in mental health support, highlighting the boundary between assistive technology and the need for human empathy.
Organisers
Damith Premasiri – Lancaster University, UK
Tharindu Ranasinghe – Lancaster University, UK
Hansi Hettiarachchi – Lancaster University, UK
Contact
If you have any questions regarding the workshop, please contact Damith: d.dolamullage(a)lancaster.ac.uk
Dear all,
We are currently doing a project aiming to make querying in syntactically annotated corpora easier and more accessible.
For this purpose, we want to know what researchers are actually searching for.
If you have a minute of your time, please feel free to fill out this form.
https://forms.office.com/e/a8DgETSabB
Feel free to reach out to ekavol(a)chalmers.se or nikdew(a)chalmers.se if you have any further questions.
Best regards
Niklas Deworetzki & Katja Voloshina
PhD Students
Department of Computer Science and Engineering
Chalmers University of Technology | University of Gothenburg
SE-412 96 Göteborg, Sweden
www.gu.se<http://www.gu.se/>
www.chalmers.se<http://www.chalmers.se/>
[cid:a8138665-78e4-4530-80d5-cf9cbf2bd3c2]
CLEF 2025 – Registration Open
Conference and Labs of the Evaluation Forum
We are pleased to announce CLEF 2025, taking place 9–12 September 2025 in Madrid, Spain at UNED. This peer‑reviewed conference and associated labs foster research in multilingual, multimodal, and cross‑language information access https://clef2025.clef-initiative.eu/.
Register now – Early‑bird registration is open! Standard registration opened earlier this year, and early-bird rates are currently available .
Why attend?
*
Present and discuss original research at main conference.
*
Engage in innovative labs and challenges, including LifeCLEF, ImageCLEF, EXIST, eRisk, CheckThat!, and more https://clef2025.clef-initiative.eu/index.php?page=Pages/labs.html.
*
Benefit from rich networking with academic and industry experts in IR, NLP, multimedia retrieval, and evaluation sciences.
For detailed conference and lab registration, registration deadlines, and pricing, please visit the official site: https://clef2025.clef-initiative.eu/index.php?page=Pages/registrationConfer…
Important Dates
*
Early‑bird registration ongoing
*
Registration closes: 31 August 2025
*
Conference & labs: 9–12 September 2025 — Madrid, Spain
We look forward to welcoming participants from across the global community — see you this September in Madrid at CLEF 2025!
Jorge Carrillo-de-Albornoz
On behalf of the CLEF 2025 Organising Committee
AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente.
Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datos<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://sede.uned.es/> de la Universidad.
Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.
Apologies for cross-posting.
---------------------------------------------------------------------------
*CALL FOR PAPERS: Language Resources and Evaluation Journal- Special Issue
on Machine Translation for Low-Resource Languages*
https://link.springer.com/collections/gbdgacbgbg
*Guest Editors:*
- Atul Kr. Ojha (Insight Research Ireland Centre for Data Analytics,
DSI, University of Galway, Ireland)
- Chao-Hong Liu (Industrial Technology Research Institute, Potamu
Research Ltd.)
- Ekaterina Vylomova (University of Melbourne, Australia)
- Flammie Pirinen (UiT The Arctic University of Norway, Tromsø)
- Jonathan Washington (Swarthmore College, USA)
- Nathaniel Oco (De La Salle University, Philippines)
- Xiaobing Zhao (Minzu University of China)
Machine translation (MT) technologies have been improved significantly in
the last decade using neural MT (NMT) approaches. However, most of these
methods rely on the availability of large parallel data for training the MT
systems, resources which are not available for the majority of language
pairs. Hence, current technologies often fall short in their ability to be
applied to low-resource languages. Developing MT technologies using
relatively small corpora still presents a major challenge for the MT
community. In addition, many methods for developing MT systems still rely
on several natural language processing (NLP) tools to pre-process texts in
source languages and post-process MT outputs in target languages. The
performance of these tools often has a great impact on the quality of the
resulting translation. The availability of MT technologies and NLP tools
can facilitate equal access to information for the speakers of a language
and determine on which side of the digital divide they will end up. The
lack of these technologies for many of the world's languages provides
opportunities both for the field to grow and for making tools available for
speakers of low-resource languages.
In the past few years, several workshops and evaluations have been
organized to promote research on low-resource languages. NIST has been
conducting Low Resource Human Language Technology evaluations (LoReHLT)
annually from 2016 to 2019. In LoReHLT evaluations, there is no training
data in the evaluation language. Participants receive training data in
related languages but need to bootstrap systems in the surprise evaluation
language at the start of the evaluation. Methods for this include pivoting
approaches and taking advantage of linguistic universals. The evaluations
are supported by DARPA's Low Resource Languages for Emergent Incidents
(LORELEI) program, which seeks to advance technologies that are less
dependent on large data resources and that can be quickly pivoted to new
languages within a very short amount of time so that information from any
language can be extracted in a timely manner to provide situation awareness
to emergent incidents. There are also the Workshop on Technologies for MT
of Low-Resource Languages (LoResMT), Special Interest Group on
Under-resourced Languages (SIGUL), Workshop on Resources and Technologies
for Indigenous, Endangered and Lesser-resourced Languages in Eurasia
(EURALI), the Workshop on Deep Learning Approaches for Low-Resource Natural
Language Processing (DeepLo). AfricaNLP, TurkLang, Conference on Machine
Translation (WMT), and International Conference on Spoken Language
Translation (IWSLT) workshop, which provide a venue for sharing research
and working on research and development in this field.
This topical collection solicits original research papers on MT
systems/methods and related NLP tools for low-resource languages in
general. LoReHLT, LORELEI, LoResMT, SIGUL, EURALI, DeepLo, WMT, and IWSLT
participants are very welcome to submit their work to the special issue.
Summary papers on MT research for specific low-resource languages, as well
as extended versions (>40% difference) of published papers from relevant
conferences/workshops, are also welcome.
Topics of the special issue include, but are not limited to:
* Research and review papers on MT systems/methods for low-resource
languages
* Research and review papers on pre-processing and/or post-processing NLP
tools for MT
* Word tokenizers/de-tokenizers for low-resource languages
* Word/morpheme segmenters for low-resource languages
* Use of morphological analyzers and/or morpheme segmenters in MT
* Multilingual/cross-lingual NLP tools for MT
* Review of available corpora of low-resource languages for MT
* Pivot MT for low-resource languages
* Zero-shot MT for low-resource languages
* Fast building of MT systems for low-resource languages
* Re-usability of existing MT systems and/or NLP tools for low-resource
languages
* Machine translation for language preservation
* Techniques that work across many languages and modalities
* Techniques that are less dependent on large data resources
* Use of language-universal resources
* Bootstrap-trained resources for the short development cycle
* Entity, relation- and event-extraction
* Sentiment detection in MT
* MT Summarisation
* Processing diverse languages, genres (news, social media, etc.) and
modalities (text, speech, video, etc.)
* Speech Translation for low-resource languages
* Multimodal MT for low-resource languages
* MT models using LLMs for low-resource languages
* Generative AI models for low-resource languages
* Evaluation metrics and datasets for low-resource languages
For further information on this initiative, please refer to
https://link.springer.com/collections/gbdgacbgbg
*IMPORTANT DATES*
*August 26, 2025: Paper submission deadlineDecember 05, 2025: Revised
papers dueMarch 2026: Publication*
* SUBMISSION GUIDELINES*
Authors should follow the "Instructions for Authors
<https://link.springer.com/journal/10579/submission-guidelines> (
https://link.springer.com/journal/10579/submission-guidelines or Overleaf
<https://link.springer.com/journal/10579/updates/17234296>)" on the LRE
journal website <https://link.springer.com/journal/10579>.
Thanks,
In this newsletter:
LDC data and commercial technology development
New publications:
Chinese Sentence Pattern Structure Treebank<https://catalog.ldc.upenn.edu/LDC2025T06>
IWSLT 2022-2023 Shared Task Training, Development and Test Set<https://catalog.ldc.upenn.edu/LDC2025S05>
KAIROS Schema Learning Complex Event Annotation<https://catalog.ldc.upenn.edu/LDC2025T07>
________________________________
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for further information.
________________________________
New publications:
Chinese Sentence Pattern Structure Treebank<https://catalog.ldc.upenn.edu/LDC2025T06> was developed at Beijing Normal University<https://english.bnu.edu.cn/> and Peking University<https://english.pku.edu.cn/>. It contains 5,016 sentences and 119,627 tokens syntactically annotated following the concept of sentence constituent analysis which emphasizes sentence pattern structure. The source data consists of 27 chapters extracted from modern Mandarin and ancient Chinese works. There are three annotation layers: lexical sense and structural mode for dynamic words; syntactic structure for clauses; and inter-clause relation within complex sentence and sentence clusters. These structures can be visualized using the Jbw-viewer tool<https://github.com/bnucip/jbwviewer> which is included in the release.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
IWSLT 2022 - 2023 Shared Task Training, Development and Test Set<https://catalog.ldc.upenn.edu/LDC2025S05> was developed by LDC and contains 210 hours of Tunisian<https://catalog.ldc.upenn.edu/LDC2025S05> Arabic conversational telephone speech, transcripts, English translations, speaker metadata, and documentation. This material constitutes the training, development, and test data used in the International Conference on Spoken Language Translation (IWSLT) Dialectal Speech Translation task (2022)<https://iwslt.org/2022/dialect> and the Dialectal and Low-resource track (2023)<https://iwslt.org/2023/low-resource>.
The telephone speech was collected by LDC in 2016-2017 from native speakers of Tunisian Arabic in Tunis. Speakers were recruited to make telephone calls to people in their social networks from a variety of noise conditions and handsets. Transcripts are orthographic following Buckwalter<https://catalog.ldc.upenn.edu/LDC2004L02> transliteration and cover 175 hours of the collected speech. IPA transcripts were added to a subset of the data. All transcribed segments were translated into English.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
KAIROS Schema Learning Complex Event Annotation<https://catalog.ldc.upenn.edu/LDC2025T07> was developed by LDC to support the DARPA KAIROS program. It contains English and Spanish text, audio, video, and image data labeled for 93 real-world complex events with event, relation, and argument annotations linking to document provenance. Source data was collected from the web; 3431 root web pages were collected and processed, yielding 1919 text data files, 24019 image files, 1472 video files, and 16 audio files.
The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions, and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large, multilingual, multimedia corpus.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
Dear CLIN enthusiasts
We are extending the submission deadline for CLIN abstracts by one week. The new, final deadline is June 20th. Below you can find the original call for abstracts with a modified date.
Website: https://clin35.ccl.kuleuven.be/
We invite submissions for CLIN35, the 35th edition of the Computational Linguistics in the Netherlands (CLIN) conference, which will take place in Leuven on September 12th, 2025.
Abstracts describing theoretical or applied research in any area of computational linguistics and natural language processing are welcome. We especially encourage submissions related to the Dutch language, but contributions on other languages and multilingual approaches are equally welcome. Abstracts must be written in English and should not exceed 500 words.
Submissions should include:
* Name and affiliation of each author
* Contact details
* Presentation title and short abstract (max. 500 words)
* Keywords
* Your presentation format preference (We will do our best to accommodate your preference but may need to make changes to provide a well-balanced program)
Abstracts must be submitted via the form on the website<https://clin35.ccl.kuleuven.be/call-for-abstracts> by Friday, 20th of June 2025. Notifications of acceptance will be sent out by Friday, 4th of July 2025. Accepted abstracts will be presented at the conference as oral or poster presentations. Authors with accepted abstracts will also have the opportunity to submit a full paper after the conference for publication in the CLIN Journal<https://www.clinjournal.org/clinj/>.
Please share this call with your interested colleagues and network! For any questions you can reach us at this email address (clin35(a)kuleuven.be<mailto:clin35@kuleuven.be>).
We look forward to your submissions and to welcoming you to CLIN35!
CLIN35 local organizers
________________________________
Denk je aan het milieu? Print alleen als het nodig is.
Aan dit bericht kunnen geen rechten worden ontleend.
Het bericht is alleen bestemd voor de geadresseerde.
Indien het bericht niet voor u is bestemd, verzoeken wij
u dit aan ons te melden en het bericht te verwijderen.
This message shall not constitute any obligations.
This message is intended solely for the addressee.
If you have received this message in error, please
inform us and delete the message.
________________________________
******************************************************
********* EVALITA 2026: Call for tasks *********
******* NEW DEADLINES and TIMELINE ******
******************************************************
EVALITA 2026 is an initiative of AILC (Associazione Italiana di Linguistica
Computazionale, AILC https://www.ai-lc.it/).
As in the previous editions (https://www.evalita.it/), EVALITA 2026 will be
organized along a few selected tasks, which provide participants with
opportunities to discuss and explore both emerging and traditional
areas of Natural
Language Processing and Speech. The participation is encouraged for teams
working both in academic institutions and industrial organizations.
TASK PROPOSAL SUBMISSION
Task proposals should be no longer than 4 pages and should include:
-
task title and acronym;
-
names and affiliation of the organizers (minimum 2 organizers);
-
brief task description, including motivations and state of the art;
-
explanation of the international relevance of the task;
-
description and examples of the data, including information about their
availability, development stage, and issues concerning privacy and data
sensitivity. The examples are mandatory because they are intended to give
potential participants an idea of what the task data will look like, how
it’ll be formatted, etc.
-
expected number of participants and attendees;
-
names and contact information of the organizers.
We also accept the re-annotation/expansion of datasets from previous years
and previous challenges with new annotation levels, and texts from publicly
available corpora. However, test annotations must be new and unpublished,
as participants must not have access to the test data annotations until the
end of EVALITA campaign. For new tasks, organizers must specify in the
proposal why it would attract a reasonable number of participants, and why
it is needed. For re-runs, organizers must describe the element of novelty
from previous challenges.
In submitting your proposal, please bear in mind that we strongly encourage:
-
tasks that pose non-trivial challenges and stimulate the creation of
innovative systems (i.e., that integrate linguistic insights or external
knowledge sources), rather than being easily addressed by off-the-shelf LLM
prompting techniques;
-
tasks focused on multimodality, e.g., considering both textual and
visual or any other modality;
-
tasks characterized by different levels of complexity, e.g., with a
straightforward main subtask and one or more sophisticated additional
subtasks;
-
to consider providing competitive baselines (e.g., small-scale LLMs in
zero-shot setups), which participants are expected to improve upon, in
order to encourage the design of advanced solutions;
-
application-oriented tasks, that is, tasks that have a clearly defined
end-user application showcasing;
-
multilingual tasks, i.e. with data both in Italian and in other
languages;
-
industrial tasks, i.e. tasks with real data provided by companies.
The organizers of the accepted tasks should take care of planning,
according to the scheduled deadlines (see below):
-
the development and distribution of datasets needed for the contest,
i.e. data for training and development, and data for testing; the scorer to
be used to evaluate the submitted systems should be included in the release
of development data;
-
the development of task guidelines, where all the instructions for the
participation are made clear, together with a detailed description of data
and evaluation metrics applied for the evaluation of the participant's
results;
-
the collection of participants' results;
-
the evaluation of participants' results according to standard metrics
and baseline(s);
-
the solicitation of participation and submissions;
-
the reviewing process of the papers describing the participants'
approach and results (according to the template to be made available by the
EVALITA 2026 chairs);
-
the production of a paper describing the task (according to the template
to be made available by the EVALITA 2026 chairs).
*** Email your proposal in PDF format to evalitacampaign(a)gmail.com with
"EVALITA 2026 TASK Proposal" as the subject line by the submission
deadline: July 28th 2025. ***
Please feel free to contact the EVALITA 2026 chairs at
evalitacampaign(a)gmail.com in case of any questions or suggestions.
Deadlines of the task proposal:
-
July 21th 2025 July 28th 2025: submission of task proposals
-
July 31th 2025 August 7th 2025: notification of task proposal acceptance
Timelines of EVALITA 2026:
-
22nd September 2025: development data available to participants
-
3 - 17th November 2025: evaluation windows
-
28th November 2025: assessments returned to participants
-
15th December 2025: final reports (from participants) due to task
organizers
-
22nd December 2025: final reports (from task organizers) due to EVALITA
chairs
-
19th January 2025: review deadline
-
2nd February 2026: camera-ready version deadline
-
26 - 27th February 2026: final workshop in Bari
EVALITA 2026 CHAIRS
Francesco Cutugno (Università di Napoli)
Alessio Miaschi (Istituto di Lingustica Computazionale “A. Zampolli” - CNR)
Alessio Palmero Aprosio (Università di Trento)
Giulia Rambelli (Università di Bologna)
Lucia Siciliani (Università di Bari)
Marco Antonio Stranisci (Università di Torino)
FURTHER INFORMATION
Website: https://www.evalita.it/campaigns/evalita-2026/call-for-tasks/
Mail: evalitacampaign(a)gmail.com
Marco,
UNITO <https://www.unito.it/persone/mstranis> and aequa-tech
<https://aequa-tech.com/>
The UKP Lab at the Department of Computer Science, Technical University Darmstadt, Germany, is looking for
*** two fully funded 𝗣𝗵𝗗 𝗦𝘁𝘂𝗱𝗲𝗻𝘁𝘀 𝗮𝗻𝗱/𝗼𝗿 𝗣𝗼𝘀𝘁𝗱𝗼𝗰𝘀 ***
for an exciting project in machine-generated text detection. This is a unique opportunity to join the UKP Lab on the intersection of AI Safety, Natural Language Processing and Machine Learning. If you're excited about shaping the future of Large Language Models, AI Agents, human-AI interaction, building novel prototypes, and publishing at top-tier venues of NLP, ML and AI, we’d love to hear from you.
🔗 More information:
https://www.informatik.tu-darmstadt.de/ukp/ukp_home/jobs_ukp/2025_phd_ukp.e…
📩 Apply here:
https://careers.ukp.informatik.tu-darmstadt.de/ukprecruitment
📅 Application deadline: June 29th, 2025
--------------------------------------------------------------------
Prof. Dr. Iryna Gurevych
UKP Lab
Technical University Darmstadt, Germany
http://www.ukp.tu-darmstadt.de/
Third call for papers Sixth Workshop on Resources for African
Indigenous Language (RAIL)
Co-located with DHASA 2025
https://sadilar.org/rail-2025/
RAIL Workshop date: 10 November 2025
DHASA Conference dates: 10-14 November 2025
Venue: CSIR International Convention Centre.
The sixth RAIL workshop website: https://sadilar.org/rail-2025/
DHASA website: https://digitalhumanities.org.za/
The sixth Resources for African Indigenous Languages (RAIL) workshop
will be co-located with the Digital Humanities Association of Southern
Africa (DHASA) 2025 conference at the CSIR International Convention
Centre in Pretoria, South Africa, on 10 November 2025. The RAIL
workshop is an interdisciplinary platform for researchers working on
African indigenous languages resources such as natural languages
processing (NLP) tools, Human Language Technologies (HLT), data
collections, and annotations. This workshop aims to foster a
scientific community of practice that focuses on computational
linguistic tools and data that are designed for or applied to the
indigenous languages of Africa.
Many African languages are under-resourced while only a few are
considered to be somewhat better resourced. These languages often share
interesting properties such as writing systems, making them different
from most high-resourced languages. From a computational perspective,
these languages lack enough corpora to undertake high level development
of NLP and HLT tools, which in turn impedes the development of African
languages in these areas. During previous workshops, it was noted that
the problems and solutions presented were not only applicable to
African languages but were also relevant to many other low-resource
languages across the world. Because these languages share similar
challenges, this workshop provides researchers with opportunities to
work collaboratively on issues of language resource development and
learn from each other.
The RAIL workshop has several aims. First, the workshop brings together
researchers who work on African indigenous languages, forming a
community of practice for people working on indigenous languages.
Second, the workshop aims to reveal currently unknown or unpublished
existing resources (corpora, NLP tools, and applications), resulting in
a better overview of the current state-of-the-art, and also allows for
discussions on novel, desired resources for future research in this
area. Third, it enhances sharing of knowledge on the development of
low-resource languages. Finally, it enables discussions on how to
improve the quality as well as availability of the resources.
The workshop has “Language resources in the age of large language
models” as its theme, but submissions on any topic related to
properties of African indigenous languages (including related non-
African languages) may be accepted. Suggested topics include (but are
not limited to) the following:
* Digital representations of linguistic structures
* Descriptions of corpora or other data sets of African indigenous
languages
* Building resources for (under-resourced) African indigenous languages
* Developing and using African indigenous languages in the digital age
* Effectiveness of digital technologies for the development of African
indigenous languages
* Revealing unknown or unpublished existing resources for African
indigenous languages
* Developing desired resources for African indigenous languages
* Improving quality, availability and accessibility of African
indigenous language resources
Submission requirements:
We invite papers on original, unpublished work related to the topics of
the workshop. Submissions, presenting completed work, may consist of up
to eight (8) pages of content plus additional pages of references. The
final camera-ready version of accepted long papers are allowed one
additional page of content (up to 9 pages) so that reviewers’ feedback
can be incorporated. Papers should be formatted according to the DHASA
style sheet which is provided on the Journal of the Digital Humanities
Association of Southern Africa website
(https://upjournals.up.ac.za/index.php/dhasa/about). Reviewing is
double-blind, so make sure to anonymise your submission (e.g., do not
provide author names, affiliations, project names, etc.) Limit the
amount of self citations (anonymised citations should not be used). The
RAIL workshop follows the DHASA submission requirements.
Please submit papers in PDF format (the submission link will be
available soon). Accepted papers will be published in proceedings
linked to the DHASA conference.
Important dates:
Submission deadline: 14 July 2025
Date of notification: 16 September 2025
Camera ready copy deadline: 24 October 2025
Workshop: 10 November 2025
DHASA conference: 10 November 2025-14 November 2025
Organising Committee
Rooweither Mabuya, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Muzi Matfunjwa, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Mmasibidi Setaka, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Menno van Zaanen, South African Centre for Digital Language Resources
(SADiLaR), South Africa
--
Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za
Professor in Digital Humanities
South African Centre for Digital Language Resources
https://www.sadilar.org
________________________________
NWU PRIVACY STATEMENT:
http://www.nwu.ac.za/it/gov-man/disclaimer.html
DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system.
________________________________