In the past few years, several workshops and evaluations have been organized to promote research on low-resource languages. NIST has been conducting Low Resource Human Language Technology evaluations (LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is no training data in the evaluation language. Participants receive training data in related languages but need to bootstrap systems in the surprise evaluation language at the start of the evaluation. Methods for this include pivoting approaches and taking advantage of linguistic universals. The evaluations are supported by DARPA's Low Resource Languages for Emergent Incidents (LORELEI) program, which seeks to advance technologies that are less dependent on large data resources and that can be quickly pivoted to new languages within a very short amount of time so that information from any language can be extracted in a timely manner to provide situation awareness to emergent incidents. There are also the Workshop on Technologies for MT of Low-Resource Languages (LoResMT), Special Interest Group on Under-resourced Languages (SIGUL), Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI), the Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing (DeepLo). AfricaNLP, TurkLang, Conference on Machine Translation (WMT), and International Conference on Spoken Language Translation (IWSLT) workshop, which provide a venue for sharing research and working on research and development in this field.
This topical collection solicits original research papers on MT systems/methods and related NLP tools for low-resource languages in general. LoReHLT, LORELEI, LoResMT, SIGUL, EURALI, DeepLo, WMT, and IWSLT participants are very welcome to submit their work to the special issue. Summary papers on MT research for specific low-resource languages, as well as extended versions (>40% difference) of published papers from relevant conferences/workshops, are also welcome.
Topics of the special issue include, but are not limited to:
* Research and review papers on MT systems/methods for low-resource languages
* Research and review papers on pre-processing and/or post-processing NLP tools for MT
* Word tokenizers/de-tokenizers for low-resource languages
* Word/morpheme segmenters for low-resource languages
* Use of morphological analyzers and/or morpheme segmenters in MT
* Multilingual/cross-lingual NLP tools for MT
* Review of available corpora of low-resource languages for MT
* Pivot MT for low-resource languages
* Zero-shot MT for low-resource languages
* Fast building of MT systems for low-resource languages
* Re-usability of existing MT systems and/or NLP tools for low-resource languages
* Machine translation for language preservation
* Techniques that work across many languages and modalities
* Techniques that are less dependent on large data resources
* Use of language-universal resources
* Bootstrap-trained resources for the short development cycle
* Entity, relation- and event-extraction
* Sentiment detection in MT
* MT Summarisation
* Processing diverse languages, genres (news, social media, etc.) and modalities (text, speech, video, etc.)
* Speech Translation for low-resource languages
* Multimodal MT for low-resource languages
* MT models using LLMs for low-resource languages
* Generative AI models for low-resource languages
* Evaluation metrics and datasets for low-resource languages
For further information on this initiative, please refer to https://link.springer.com/collections/gbdgacbgbg
IMPORTANT DATES
May 26, 2025: Expression of interest (EOI) via this form: https://forms.gle/QqeqxZgGfsxP6rZ77