Dear all,
We are pleased to invite you to participate in the MultiLexNorm 2026 shared task, which will be hosted at EMNLP 2026.
Our shared task operates at the word level and focuses on lexical normalization, that is, transforming an utterance into its standard form (e.g., ppl → people) on the word level. It also includes one-to-many (1-to-n) and many-to-one (n-to-1) replacements. Participants will develop systems for lexical normalization across 17 languages.
Building on the previous task, which focused on Indo-European languages written in the Latin script, we now focus on languages written in other scripts, and have new benchmarks for Indonesian, Japanese, Korean, Thai, and Vietnamese.
The data and more information about the task can be found on: https://noisy-text.github.io/2026/multi-lexnorm.html
Dates: 21-Jul Test data 01-Aug Final Evaluation 20-Aug Paper deadline 05-Sep Paper reviewed 15-Sep Camera ready TBA Workshop
Best, The organizers: Rob van der Goot Weerayut Buaphet