AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages
First Call for Participation
The AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages https://turing.iimas.unam.mx/americasnlp/2023_st.html is a competition aimed at encouraging the development of machine translation (MT) systems for Indigenous languages of the Americas. Participants will build systems that translate between Spanish and an Indigenous language. Systems submitted to the shared task will be presented at the Third Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP) on July 14, 2023, which will be co-located with the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), which will be held in Toronto, Canada.
Why?
Many of the Indigenous languages of the Americas are so-called low-resource languages: parallel data with other languages as needed to train MT systems is limited. This means that many approaches designed for translating between high-resource languages, such as English and Chinese, are not directly applicable or perform poorly. Additionally, many Indigenous languages exhibit linguistic properties uncommon among languages frequently studied in natural language processing (NLP). For instance, many are polysynthetic. This constitutes an additional difficulty. The goal of AmericasNLP is to motivate researchers to take on the challenge of developing MT systems for Indigenous languages.
How?
AmericasNLP invites the submission of MT results obtained by systems built for Indigenous languages. Participants can use the training and development data we provide, but there are no limits on what participants can use. If participants want to translate additional data to improve their systems, that's great! If they want to use pretrained models, that's great, too! The only limitation is that we ask participants to not have the test input translated by hand or train on the development or test sets.
The main metric of the shared task is ChrF++ (Popović, 2017). Participants can enter the competition with as many language pairs as they like, and systems for every language pair will be evaluated separately. We provide an evaluation script and a baseline MT system to help participants get started quickly. If you are interested in this shared task, please register here https://forms.gle/ZMVWCxoFunHF3bjNA.
Which languages?
The following language pairs are featured in the AmericasNLP 2023 shared task:
-
Hñähñu–Spanish -
Wixarika–Spanish -
Nahuatl–Spanish -
Guaraní–Spanish -
Bribri–Spanish -
Rarámuri–Spanish -
Quechua–Spanish -
Aymara–Spanish -
Shipibo-Konibo–Spanish -
Asháninka–Spanish -
👻Surprise language👻–Spanish
Spanish is always the target language: systems are evaluated on translating from an Indigenous language into Spanish.
Important Dates
-
Release of initial languages and evaluation script: March 16, 2023 -
Release of baseline system and baseline results: March 20, 2023 -
Release of surprise language data: April 21, 2023 -
Submission of translations (shared task deadline): May 07, 2023 -
Announcements of results: May 09, 2023 -
Submission of system description papers: May 16, 2023 -
Notification of acceptance: May 20, 2023 -
Camera-ready papers due: May 26, 2023 -
Workshop: July 14, 2023
All deadlines are 11:59 pm UTC -12h (AoE).
Organizers
Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Enora Rice, John Ortega, Shruti Rijhwani, Ivan Vladimir Meza Ruiz, Alexis Palmer, Katharina Kann
Contact: americas.nlp.workshop@gmail.com Website: https://turing.iimas.unam.mx/americasnlp/2023_st.html