Dear all,
After a successful first edition in 2021, we are glad to invite you to the second MultiLexNorm shared task! The shared task will be hosted at WNUT 2025.
As defined in the previous iteration, lexical normalization is: The task of transforming an utterance into its standard form, word by word, including both one-to-many (1-n) and many-to-one (n-1) replacements.
Building on the previous task which focused on Indo-European languages written in the Latin script, we extended the benchmark to include languages written in other scripts. We now include data for Thai, Vietnamese, and Indonesian. The data and more information about the task can be found on:
https://noisy-text.github.io/2025/multi-lexnorm.html#
Dates: Data available Nov 15, 2024 Data freeze Jan 07, 2025 Test data Jan 25, 2025 Final Evaluation Feb 07, 2025 Paper deadline Feb 25, 2025 Paper reviewed Mar 01, 2025 Camera ready Mar 10, 2025 Workshop May 03, 2025 (TBD)
Best, The organizers: Rob van der Goot Weerayut Buaphet Peerat Limkonchotiwat Thanh-Nhi Nguyen Thanh-Phong Le