[Corpora-List] [2nd CfP] 3rd Shared Task of the Open Language Data Initiative @ WMT26

24 Jun 2026


      We invite submissions for the 3rd Shared Task of the Open Language Data
Initiative (OLDI), held as part of WMT at EMNLP 2026 (Budapest, 28–29
October).
*Key dates (AoE):*
 - Paper and data submission: 7 August 2026
 - Notification of acceptance: 2 September 2026
 - Camera-ready: 11 September 2026
*Background*
Foundational datasets such as FLORES and NTREX have played a key role in
enabling progress in language technologies for some under-served
languages. OLDI
aims to empower language communities to contribute to the key datasets that
expand language technology to more language varieties. Additionally,
machine translation
depends increasingly on automatic quality evaluation for tasks such as
filtering training data, ranking candidates, reinforcement learning, and
benchmarking. However, human-annotated quality data exists for only a
handful of languages, and how well evaluation methods generalise to new
ones remains an open question.
*Scope*
The task's primary goal is to expand OLDI's open datasets to more
languages. We solicit contributions to:
- the MT evaluation dataset FLORES+ (
https://huggingface.co/datasets/openlanguagedata/flores_plus)
- the OLDI Seed dataset (
https://huggingface.co/datasets/openlanguagedata/oldi_seed)
- other high-quality, massively-parallel, open-source datasets (e.g. SMOL,
WMT24++, BOUQuET)
Contributions may add new languages, varieties or dialects; substantially
improve existing datasets; or create entirely new massively multilingual
open translation datasets.
We also welcome new or extended datasets of *translation quality
annotations* for under-served languages: source texts, their machine or
human translations, and human judgements of translation quality.
*Submissions*
To help us gauge interest and coordinate efforts, please email the
organisers before submitting: info@oldi.org. Full contribution guidelines
and submission format are on the shared task website (
https://www2.statmt.org/wmt26/open-data.html) and OLDI's homepage (
https://oldi.org/).

2026

2025

2024

2023

2022

[Corpora-List] [2nd CfP] 3rd Shared Task of the Open Language Data Initiative @ WMT26