We invite submissions for the 3rd Shared Task of the Open Language Data Initiative (OLDI), held as part of WMT at EMNLP 2026 (Budapest, 28–29 October).
*Key dates (AoE):* - Paper and data submission: 7 August 2026 - Notification of acceptance: 2 September 2026 - Camera-ready: 11 September 2026
*Background* Foundational datasets such as FLORES and NTREX have played a key role in enabling progress in language technologies for some under-served languages. OLDI aims to empower language communities to contribute to the key datasets that expand language technology to more language varieties. Additionally, machine translation depends increasingly on automatic quality evaluation for tasks such as filtering training data, ranking candidates, reinforcement learning, and benchmarking. However, human-annotated quality data exists for only a handful of languages, and how well evaluation methods generalise to new ones remains an open question.
*Scope* The task's primary goal is to expand OLDI's open datasets to more languages. We solicit contributions to: - the MT evaluation dataset FLORES+ ( https://huggingface.co/datasets/openlanguagedata/flores_plus) - the OLDI Seed dataset ( https://huggingface.co/datasets/openlanguagedata/oldi_seed) - other high-quality, massively-parallel, open-source datasets (e.g. SMOL, WMT24++, BOUQuET)
Contributions may add new languages, varieties or dialects; substantially improve existing datasets; or create entirely new massively multilingual open translation datasets.
We also welcome new or extended datasets of *translation quality annotations* for under-served languages: source texts, their machine or human translations, and human judgements of translation quality.
*Submissions* To help us gauge interest and coordinate efforts, please email the organisers before submitting: info@oldi.org. Full contribution guidelines and submission format are on the shared task website ( https://www2.statmt.org/wmt26/open-data.html) and OLDI's homepage ( https://oldi.org/).