!!!!!!!!!!   The deadline for submission has been extended to July 20th
!!!!!!!!!!   Upload Submissions Now
https://cmt3.research.microsoft.com/AMTA2022

The First Workshop on Corpus Generation and Corpus Augmentation for 
Machine Translation (CoCo4MT)
https://sites.google.com/view/coco4mt
@ AMTA – 2022
This 15th biennial conference of the Association for Machine Translation 
in the Americas
12-16 September 2022, Orlando, Florida, USA

INVITED TALKS

Jörg Tiedemann University of Helsinki
Julia Kreutzer Google Research
Maria Nadejde Amazon

SCOPE

It is a well-known fact that machine translation systems, especially 
those that use deep learning, require massive amounts of data. Several 
resources for languages are not available in their human-created format. 
Some of the types of resources available are monolingual, multilingual, 
translation memories, and lexicons. Those types of resources are 
generally created for formal purposes such as parliamentary collections 
when parallel and more informal situations when monolingual. The quality 
and abundance of resources including corpora used for formal reasons is 
generally higher than those used for informal purposes. Additionally, 
corpora for low-resource languages, languages with less digital 
resources available, tends to be less abundant and of lower quality.

CoCo4MT sets out to be the first workshop centered around research that 
focuses on corpora creation, cleansing, and augmentation techniques 
specifically for machine translation. We accept work that covers any 
spoken language (including high-resource languages) but we are 
specifically interested in those submissions that are on languages with 
limited existing resources (low-resource languages) where resources are 
not highly available.

The goal of this workshop is to begin to close the gap between corpora 
available for low-resource translation systems and promote high-quality 
data for online systems that can be used by native speakers of 
low-resource languages is of particular interest. Therefore, It will be 
beneficial if the techniques presented in research papers include their 
impact on the quality of MT output and how they can be used in the real 
world.

CoCo4MT aims to encourage research on new and undiscovered techniques. 
We hope that submissions will provide high-quality corpora that is 
available publicly for download and can be used to increase machine 
translation performance thus encouraging new dataset creation for 
multiple languages that will, in turn, provide a general workshop to 
consult for corpora needs in the future. The workshop’s success will be 
measured by the following key performance indicators:

- Promotes the ongoing increase in quality of machine translation 
systems when measured by standard measurements,
- Provides a meeting place for collaboration from several research areas 
to increase the availability of commonly used corpora and new corpora,
- Drives innovation to address the need for higher quality and abundance 
of low-resource language data.

TOPICS

We are highly interested in original research papers  on the topics 
below; however, we welcome all novel ideas that cover research on 
corpora techniques.

- Difficulties with using existing corpora (e.g., political 
considerations or domain limitations) and their effects on final MT 
systems,
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for 
training MT systems.

SUBMISSION INFORMATION

There is one type of submission in the workshop:  Research, review and 
position paper. The length of each paper should be at least four (4) and 
not exceed ten (10) pages, plus unlimited pages for references. 
Submissions should be formatted according to the official AMTA 2022 
style templates (PDF, LaTeX, Word). Accepted papers will be published 
on-line in the AMTA 2022 proceedings which includes the ACL Anthology 
and will be presented at the conference either orally or as a poster.

Submissions must be anonymized and should be done using the official 
conference management system 
(https://cmt3.research.microsoft.com/AMTA2022). Scientific papers that 
have been or will be submitted to other venues must be declared as such, 
and must be withdrawn from the other venues if accepted and published at 
CoCo4MT. The review will be double-blind.

We would like to encourage authors to cite papers written in ANY 
language that are related to the topics, as long as both original 
bibliographic items and their corresponding English translations are 
provided.

Registration will be handled by the main conference. (To be announced)

IMPORTANT DATES

June 1, 2022 – Call for papers released
June 15, 2022 – Second call for papers
June 29, 2022 – Third and final call for papers
July 20, 2022 – Paper submissions due (updated extension!)
July 27, 2022 – Notification of acceptance
August 7, 2022 –  Camera-ready due
August 31, 2022 –  Video recordings due
September 16, 2022 - CoCo4MT workshop

CONTACT

CoCo4MT Workshop Organizers
coco4mt2022@googlegroups.com

ORGANIZING COMMITTEE (listed alphabetically)

Constantine Lignos     Brandeis University
John E. Ortega     New York University and University of Santiago de 
Compostela (CITIUS)
Katharina Kann     University of Colorado Boulder
Maja Popopvić     ADAPT Centre at Dublin City University
Marine Carpuat     University of Maryland
Shabnam Tafreshi     University of Maryland
William Chen     Carnegie Mellon University

PROGRAM COMMITTEE (listed alphabetically tentative)

Abteen   Ebrahimi     University of Colorado Boulder
Adelani  David     Saarland University
Ananya  Ganesh     University of Colorado Boulder
Alberto Poncelas     ADAPT Centre at Dublin City University
Amirhossein Tebbifakhr     University of Trento
Anna Currey     Amazon
Arturo Oncevay     University of Edinburgh
Atul Kr. Ojha     National University of Ireland Galway
Bharathi Raja Chakravarthi     National University of Ireland Galway
Beatrice Savoldi     University of Trento
Bogdan Babych     Heidelberg University
Briakou  Eleftheria     University of Maryland
Dossou  Bonaventure     Mila Quebec AI Institute
Duygu Ataman     New York University
Eleni Metheniti     Université Toulosse - Paul Sabatier
Francis Tyers     Indiana University
Jasper Kyle Catapang     University of Birmingham
John E. Ortega     New York University and USC - CITIUS
José Ramom Pichel Campos     Universidade de Santiago de Compostela - CITIUS
Kalika Bali     Microsoft
Koel Dutta Chowdhury     Saarland University
Liangyou Li     Huawei
Manuel  Mager     University of Stuttgart
Maria Art Antonette Clariño     University of the Philippines Los Baños
Mathias Müller     University of Zurich
Nathaniel Oco     De La Salle University
Niu  Xing     Amazon
Pablo Gamallo     Universidade de Santiago de Compostela - CITIUS
Rodolfo Joel Zevallos Salazar     Universitat Pompeu Fabra
Rico Sennrich     University of Zurich
Sangjee Dondrub     Qinghai Normal University
Santanu Pal     Saarland University
Sardana Ivanova     University of Helsinki
Shantipriya Parida     Silo AI
Surafel Melaku Lakew     Amazon
Tommi A Pirinen     University of Tromsø
Valentin Malykh     Moscow Institute of Physics and Technology

--
Shabnam Tafreshi, PhD
Assistant Research Scientist
Computational Linguistics, NLP
UMD: ARLIS @ College Park

"All the problems of the world could be settled easily, if people only willing to think."
-Thomas J. Watson