Envoyé de mon iPhone

Début du message transféré :

Expéditeur: Philipp Koehn <phi@jhu.edu>
Date: 24 avril 2021 à 00:03:43 UTC+2
Destinataire: wmt-tasks@googlegroups.com, Moses Support <moses-support@mit.edu>,  "corpora@uib.no" <CORPORA@uib.no>
Objet: Call for Participation: WMT 2021 Machine Translation using Terminologies
Répondre à: wmt-tasks@googlegroups.com

WMT 2021 Shared Task: 

Machine Translation using Terminologies

November 10-11 , 2021
Punta Cana, Dominican Republic

Language domains that require very careful use of terminology are abundant. The need to adequately translate within such domains is undeniable, as shown by e.g. the different WMT shared tasks on biomedical translation.

More interestingly, as the abundance of research on domain adaptation shows, such language domains are (a) not adequately covered by existing data and models, while (b) new (or “surge”) domains arise and models need to be adapted, often with significant downstream implications: consider the new COVID-19 domain and the large efforts for translation of critical information regarding pandemic handling and infection prevention strategies.

In the case of newly developed domains, while parallel data are hard to come by, it is fairly straightforward to create word- or phrase-level terminologies, which can be used to guide professional translators and ensure both accuracy and consistency.

This shared task will replicate such a scenario, and invites participants to explore methods to incorporate terminologies into either the training or the inference process, in order to improve both the accuracy and consistency of MT systems on a new domain.

IMPORTANT DATES

Release of training data and terminologies   April 2021
Surprise languages announced:June 28, 2021
Test set availableJuly 19, 2021
Submission of translationsJuly 23, 2021
System descriptions dueAugust 5, 2021
Camera-ready for system descriptionsSeptember 15, 2021
Conference in Punta CanaNovember 10-11, 2021

SETTINGS

In this shared task, we will distinguish submissions that use the terminology only at inference time (e.g., for constrained decoding or something similar) and submissions that use the terminology at training time (e.g., for data selection, data augmentation, explicit training, etc). Note that basic linguistic tools such as taggers, parsers, or morphological analyzers are allowed in the constrained condition.

The submission report should highlight in which ways participants’ methods and data differ from the standard MT approach. They should make clear which tools were used, and which training sets were used.

LANGUAGE PAIRS

The shared task will focus on four language pairs, with systems evaluated:
  • English to French
  • English to Chinese
  • Two surprise language pairs English-X (announced 3 weeks before the evaluation deadline)
We will provide training/development data and terminologies for the above language pairs. Test sets will be released at the beginning of the evaluation period. The goal of this setting (with both development and surprise language pairs) is to avoid approaches that overfit on language selection, and instead evaluate the more realistic scenario of needing to tackle the new domain in a new language in a limited amount of time. The surprise language pairs will be announced 3 weeks before the start of the evaluation campaigns. At the same time we will provide training data and terminologies for the surprise language pairs.

You may participate in any or all of the language pairs.

ORGANIZERS

Antonis Anastasopoulos, George Mason University
Md Mahfuz ibn Alam, George Mason University
Laurent Besacier, NAVER
James Cross, Facebook
Georgiana Dinu, AWS
Marcello Federico, AWS
Matthias Gallé, NAVER
Philipp Koehn, Facebook / Johns Hopkins University
Vassilina Nikoulina, NAVER
Kweon Woo Jung, NAVER

--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/CAAFADDBpQCyXGOdFTYMN185fB_iKKYn%3DCqFDoqsRnoj3XXwDEQ%40mail.gmail.com.