Important UPDATES/EXTENSION: ClinSpEn sub-track (Biomedical WMT Task, EMNLP 2022)
Machine Translation of Clinical cases, ontologies & medical entities: Spanish - English
Evaluation period extension, test and background data available on Zenodo and CodaLab submission available.
The ClinSpEn track of the Biomedical WMT 2022 shared task tries to address a pressing need and emerging research topic related to the development and exploitation of multilingual clinical NLP and text mining applications.
Recent advances in neural machine translation approaches (MT) adapted to specific domains and text genres have resulted in promising results that facilitate processing of healthcare and clinical data beyond language silos.
The ClinSpEn sub-track tries to promote the use of advanced machine translation technologies applied to three high impact healthcare application scenarios:
(1) automatic translation of clinical case documents of importance to examine how MT could be further applied to cope with clinical records
(2) automatic translation of clinical terms and entity mentions extracted directly from medical records and literature to improve multilingual semantic annotation technologies
(3) automatic translation of ontologies and controlled vocabulary concepts of uttermost importance for multilingual data and concept normalization
These three scenarios will be addressed by three specific benchmark data collections used for evaluation purposes by the ClinSpEn biomedical WMT track:
ClinSpEn-CC (Clinical Cases): EN>ES translation of clinical case documents.
ClinSpEn-CT (Clinical Terms): ES>EN translation of clinical terms and entity mentions extracted from records and literature.
ClinSpEn-OC (Ontology Concepts): EN>ES translation of highly used open clinical controlled vocabularies and ontology concepts.
Important links:
-
ClinSpEn web: https://temu.bsc.es/clinspen/ -
Biomedical WMT web: https://statmt.org/wmt22/biomedical-translation-task.html -
WMT2022: https://statmt.org/wmt22/ -
EMNLP conference: https://2022.emnlp.org/ -
Data (NEW!):
-
Clinical Cases: https://doi.org/10.5281/zenodo.6497350 -
Clinical Terms: https://doi.org/10.5281/zenodo.6497372 -
Ontology Concepts: https://doi.org/10.5281/zenodo.6497388
-
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/6696 -
Team Registration (mandatory): https://temu.bsc.es/clinspen/registration/
For the ClinSpEn track Gold Standard manual translations generated by professional medical translators have been generated to evaluate participating teams. The primary evaluation metric to be used for this track will be SacreBLEU.
Participants will also have access to a larger background collection to promote scalability and robustness assessment of machine translation technology.
Updated schedule:
-
Participant Predictions Due: August 30th, 2022 (UPDATED EXTENSION!) -
Paper Submission: September 7th, 2022 -
Acceptance notification: October 9th, 2022 -
Camera-ready version: October 16th, 2022 -
WMT workshop at EMNLP: December 7th and 8th, 2022
Publications and workshop
Participating teams will be invited to contribute a systems description paper for the WMT 2022 Working Notes proceedings. This workshop will be part of the prestigious EMNLP 2022 conference. More information on the paper’s specifications, formatting guidelines and review process at: https://statmt.org/wmt22/index.html.
ClinSpEn Track Organizers
-
Salvador Lima-López (BSC) -
Darryl Johan Estrada (BSC) -
Eulàlia Farré-Maduell (BSC) -
Martin Krallinger (BSC)
Biomedical WMT Organizers
-
Rachel Bawden (University of Edinburgh, UK) -
Giorgio Maria Di Nunzio (University of Padua, Italy) -
Darryl Johan Estrada (Barcelona Supercomputing Center, Spain) -
Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain) -
Cristian Grozea (Fraunhofer Institute, Germany) -
Antonio Jimeno Yepes (University of Melbourne, Australia) -
Salvador Lima-López (Barcelona Supercomputing Center, Spain) -
Martin Krallinger (Barcelona Supercomputing Center, Spain) -
Aurélie Névéol (Université Paris Saclay, CNRS, LISN, France) -
Mariana Neves (German Federal Institute for Risk Assessment, Germany) -
Roland Roller (DFKI, Germany) -
Amy Siu (Beuth University of Applied Sciences, Germany) -
Philippe Thomas (DFKI, Germany) -
Federica Vezzani (University of Padua, Italy) -
Maika Vicente Navarro, Maika Spanish Translator, Melbourne, Australia -
Dina Wiemann (Novartis, Switzerland) -
Lana Yeganova (NCBI/NLM/NIH, USA