MultiGEC-2025 shared task: system submission deadline extended to November 29, 2024 We renew our invitation to participate in the MultiGEC-2025 shared task on Multilingual Grammatical Error Correction, covering 12 languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. The system submissions deadline is now extended to November 29, 2024. System output is to be submitted via CodaLab (https://codalab.lisn.upsaclay.fr/competitions/20500)
The results will be presented on March 5th 2025, at the NLP4CALL workshop, co-located with the NoDaLiDa conference to be held in Estonia, Tallinn, on 2--5 March 2025. https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-series... The publication venue for system descriptions will be the proceedings of the NLP4CALL workshop, also co-published in the ACL anthology.
To register for/express interest in the shared task and get access to the data, please fill in this form (https://forms.gle/nTPfARVqy1XmqT4t6). Note that you will be prompted to sign Terms of Use for the data at https://forms.gle/VLJ18WbwsxitEBYi7. Data access is personal, please do not forget to fill in the form. You are also welcome to join the MultiGEC-2025 Google group (https://groups.google.com/g/multigec-2025) in order to ask questions, hold discussions and browse for already answered questions about the shared task.
The task description below, as well as general information, is also available on our website https://spraakbanken.gu.se/en/compsla/multigec-2025 and GitHub repository https://github.com/spraakbanken/multigec-2025/ * TASK DESCRIPTION In this shared task, your goal is to rewrite learner-written texts to make them grammatically correct or both grammatically correct and idiomatic, that is either adhering to the "minimal correction" principle or applying fluency edits. For instance, the text
My mother became very sad, no food. But my sister better five months later.
can be corrected minimally as
My mother became very sad, and ate no food. But my sister felt better five months later.
or with fluency edits as
My mother was very distressed and refused to eat. Luckily, my sister recovered five months later.
For fair evaluation of both approaches to the correction task, we will provide two evaluation metrics, one favoring minimal correction, one suited for fluency-edited output (read more under Evaluation). We particularly encourage development of multilingual systems that can process all (or several) languages using a single model, but this is not a mandatory requirement to participate in the task. * DATA We provide training, development and test data for each of the languages. The training and development dataset splits are available through Github. Evaluation will be performed on a separate test set. See website for more detailed information: https://github.com/spraakbanken/multigec-2025/
Note: The English data is expected a bit later.
* EVALUATION During the shared task, evaluation will be based on cross-lingually applicable automatic metrics: - reference-based: - GLEU score - Precision, Recall, F0.5 score
- reference-free: Scribendi score After the shared task, we also plan on carrying out a human evaluation experiment on a subset of the submitted results. * TIMELINE - June 18, 2024 - first call for participation ✓ - September 20, 2024 - second call for participation ✓ - October 20, 2024 - third call for participation. Training and validation data released ✓ - October 31, 2024 - reminder. CodaLab opens for team registrations, validation phase starts ✓ - November 13, 2024 - test phase starts ✓ - November 29, 2024 (extended) - system submission deadline (system output) - December 2, 2024 - results announced - December 16, 2024 - paper submission deadline with system descriptions - January 20, 2025 - paper reviews sent to the authors - February 3, 2025 - camera-ready deadline - March 5, 2025 - presentations of the systems at the NLP4CALL workshop
* PUBLICATION We encourage you to submit a paper with your system description to the NLP4CALL workshop special track. We follow the same requirements for paper submissions as the NLP4CALL workshop, i.e. we use the same template and apply the same page limit. All papers will be reviewed by the organizing committee. Upon paper publication, we encourage you to share models, code, fact sheets, extra data, etc. with the community through GitHub or other repositories. * ORGANIZERS - Arianna Masciolini, University of Gothenburg, Sweden - Andrew Caines, University of Cambridge, UK - Orphée De Clercq, Ghent university, Belgium - Joni Kruijsbergen, Ghent university, Belgium - Murathan Kurfali, Stockholm University, Sweden - Ricardo Muñoz Sánchez, University of Gothenburg, Sweden - Elena Volodina, University of Gothenburg, Sweden - Robert Östling, Stockholm University, Sweden
* DATA PROVIDERS - Czech: -- Alexandr Rosen, Charles University, Prague - English: -- Diane Nicholls, ELiT, Cambridge University Press & Assessment -- Andrew Caines, University of Cambridge -- Paula Buttery, University of Cambridge - Estonian: -- Mark Fishel, University of Tartu, Estonia -- Kais Allkivi, Tallinn University, Estonia -- Kristjan Suluste, Eesti Keele Instituut, Estonia - German: -- Andrea Horbach, IPN / CAU Kiel, Germany -- Josef Ruppenhofer, FernUniversität in Hagen, Germany -- Katrin Wisniewski, Universität Leipzig -- Torsten Zesch, FernUniversität in Hagen, Germany - Greek: -- Alex Tantos, Aristotle University of Thessaloniki -- Konstantinos Tsiotskas, Aristotle University of Thessaloniki -- Vassilis Varsamopoulos, Aristotle University of Thessaloniki -- Pinelopi Kikilintza, Aristotle University of Thessaloniki -- Elena Drakonaki, Aristotle University of Thessaloniki -- Eleni Tsourilla, Aristotle University of Thessaloniki -- Despoina-Ourania Touriki, Aristotle University of Thessaloniki - Icelandic: -- Isidora Glisič, University of Iceland - Italian: -- Jennifer-Carmen Frey, Eurac Research Bolzano, Italy -- Lionel Nicolas, Eurac Research Bolzano, Italy - Latvian: -- Roberts Darģis, University of Latvia -- Ilze Auzina, University of Latvia - Russian: -- Alla Rozovskaya, City University of New York (CUNY), USA - Slovene: -- Špela Arhar Holdt, University of Ljubljana, Slovenia -- Aleš Žagar, University of Ljubljana, Slovenia - Swedish: -- Arianna Masciolini, University of Gothenburg, Sweden - Ukrainian: -- Oleksiy Syvokon, Microsoft -- Mariana Romanyshyn, Grammarly
* CONTACT Please join the MultiGEC-2025 Google group (https://groups.google.com/g/multigec-2025) in order to ask questions, hold discussions and browse for already answered questions.