CoCo4MT Shared Task: First Call for Participation
We are excited to introduce a new shared task for this year’s CoCo4MT workshop! Our aim is to encourage and facilitate research on corpus construction for low-resource machine translation.
Corpus creation for machine translation is typically constrained by the cost and availability of human translators. When a new dataset needs to be created for a low-resource language or a specialized domain, the annotation budget should be used efficiently and any sentences chosen for translation should be of high quality and as useful for machine translation system training as possible.
In this shared task, we ask participants to come up with ways in which such examples can be identified for a target language without any existing data. Specifically, given a parallel corpus between high-resource languages, the goal is to choose a good subset of the high-resource corpus to be translated into the low-resource language, in order to obtain a good training set for a machine translation system. The shared task winner will be the team whose instances result in the best final system after training.
Detailed information: https://sites.google.com/view/coco4mt/shared-task
Registration: https://forms.gle/jfKSPQMKEmaaXFHy5
Important Dates
- May 19 2023: Release of train, dev and test data -
May 30 2023: Release of baselines -
July 12, 2023: Deadline to submit results -
July 20, 2023: System description papers due
Organizers (listed alphabetically)
-
Ananya Ganesh, University of Colorado Boulder -
Constantine Lignos, Brandeis University -
John E. Ortega, Northeastern University -
Jonne Sälevä, Brandeis University -
Katharina Kann, University of Colorado Boulder -
Marine Carpuat, University of Maryland -
Rodolfo Zevallos, Universitat Pompeu Fabra -
Shabnam Tafreshi, University of Maryland -
William Chen, Carnegie Mellon University