Call for participation: MultiLexNorm 2: Multilingual Lexical Normalization - Corpora

26 Nov 2024


      Dear all,
After a successful first edition in 2021, we are glad to invite you to the
second MultiLexNorm shared task! The shared task will be hosted at WNUT 2025.
As defined in the previous iteration, lexical normalization is:
The task of transforming an utterance into its standard form, word by word,
including both one-to-many (1-n) and many-to-one (n-1) replacements.
Building on the previous task which focused on Indo-European languages written
in the Latin script, we extended the benchmark to include languages written in
other scripts. We now include data for Thai, Vietnamese, and Indonesian. The
data and more information about the task can be found on:
https://noisy-text.github.io/2025/multi-lexnorm.html#
Dates:
Data available Nov 15, 2024
Data freeze Jan 07, 2025
Test data Jan 25, 2025
Final Evaluation Feb 07, 2025
Paper deadline Feb 25, 2025
Paper reviewed Mar 01, 2025
Camera ready Mar 10, 2025
Workshop May 03, 2025 (TBD)
Best,
The organizers:
Rob van der Goot
Weerayut Buaphet
Peerat Limkonchotiwat
Thanh-Nhi Nguyen
Thanh-Phong Le