Call for Participation ClinSpEn @ Biomedical WMT Shared Task (WMT/EMNLP 2022)
Automatic Translation of Clinical cases, ontologies & medical entities: Spanish - English
ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to promote the development and evaluation of machine translation systems adapted to the medical domain with three highly relevant sub-tracks: clinical cases, medical controlled vocabularies/ontologies, and clinical terms and entities extracted from medical content.
Key information:
-
ClinSpEn sub-track: https://temu.bsc.es/clinspen/ -
Biomedical WMT: https://statmt.org/wmt22/biomedical-translation-task.html -
Main WMT: https://statmt.org/wmt22/ -
EMNLP conference: https://2022.emnlp.org/
-
Sample/Training Data:
-
Clinical Cases: https://doi.org/10.5281/zenodo.6497350 -
Clinical Terms: https://doi.org/10.5281/zenodo.6497372 -
Ontology Concepts: https://doi.org/10.5281/zenodo.6497388
-
Registration/support: https://temu.bsc.es/clinspen/registration/
Motivation
Machine translation applied to the clinical domain is a specially challenging task due to the complexity of medical language and the heavy use of health-related technical terms and medical expressions. Therefore there is a large community of specialized medical translators, able to deal with medical narratives, terminologies or the use of ambiguous abbreviations and acronyms.
Taking into account the relevance, impact and diversity of health-related content, as well as the rapidly growing number of publications, EHRs, clinical trials, informed consent documents and medical terminologies there is a pressing need to be able to generate more robust medical machine translation resources together with independent quality evaluation scenarios.
Recent advances in machine translation technologies together with the use of other NLP components are showing promising results, thus domain adaptation of MT approaches can have a significant impact in unlocking key information from medical content.
The ClinSpEn sub-task of Biomedical WMT proposes three different highly relevant sub-tracks, each associated with highly relevant medical machine translation application scenarios::
-
ClinSpEn-CC (Clinical Cases) subtask: translation of clinical case documents from English to Spanish, a type of document relevant both for processing medical literature as well as clinical records.
-
ClinSpEn-CT (Clinical Terms): translation of clinical terms and entity mentions from Spanish to English. The use terms were directly extracted from medical literature and clinical records, with particular focus on diseases, symptoms, findings, procedures and professions.
-
ClinSpEn-OC (Ontology Concepts): translation of clinical controlled vocabularies and ontology concepts from English to Spanish. Ontologies and structured vocabularies represent a key resource for semantic interoperability, entity linking, biomedical knlwedgebases and precision medicine, and thus there is a pressing need to generate multilingual biomedical ontologies for a range of clinicla applications. .
A decently-sized sample set for each data type has been released. Participants may use it to test their existing systems or try out new ones.
In addition to the manually translated test set by professional medical translators, participants will also have access to a larger background collection for each of the three substracks, which might serve as additional resources and to promote scalability and robustness assessment of machine translation technology.
Schedule
-
Test and Background Set Release: July 21st, 2022 -
Participant Predictions Due: July 28th, 2022 -
Paper Submission Deadline: September 7th, 2022 -
Notification of Acceptance (peer-reviews): October 9th, 2022 -
Camera-ready Version Due: October 16th, 2022 -
WMT @ EMNLP: December 7th and 8th, 2022
[All deadlines are in AoE (Anywhere on Earth)]
Registration
For the time being, participants may register using the ClinSpEn registration form at: https://temu.bsc.es/clinspen/registration/.
This form will be used to support teams during their participation and keep them updated on the official WMT/EMNLP registration, as well as on all related deadlines and important news.
Publications and WMT workshop
Teams participating in the ClinSpEn subtrack of Biomedical WMT will be invited to contribute a systems description paper for the WMT 2022 Working Notes proceedings. More information on the paper’s specifications, formatting guidelines and review process at: https://statmt.org/wmt22/index.html.
If you are interested in Machine Translation, the biomedical domain or other language combinations, remember to check out the Biomedical WMT site and the rest of this year’s sub-tracks and language pairs: https://statmt.org/wmt22/biomedical-translation-task.html
ClinSpEn Organizers
-
Salvador Lima-López (Barcelona Supercomputing Center, Spain) -
Darryl Johan Estrada (Barcelona Supercomputing Center, Spain) -
Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain) -
Martin Krallinger (Barcelona Supercomputing Center, Spain)
Biomedical WMT Organizers
-
Rachel Bawden (University of Edinburgh, UK) -
Giorgio Maria Di Nunzio (University of Padua, Italy) -
Darryl Johan Estrada (Barcelona Supercomputing Center, Spain) -
Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain) -
Cristian Grozea (Fraunhofer Institute, Germany) -
Antonio Jimeno Yepes (University of Melbourne, Australia) -
Salvador Lima-López (Barcelona Supercomputing Center, Spain) -
Martin Krallinger (Barcelona Supercomputing Center, Spain) -
Aurélie Névéol (Université Paris Saclay, CNRS, LISN, France) -
Mariana Neves (German Federal Institute for Risk Assessment, Germany) -
Roland Roller (DFKI, Germany) -
Amy Siu (Beuth University of Applied Sciences, Germany) -
Philippe Thomas (DFKI, Germany) -
Federica Vezzani (University of Padua, Italy) -
Maika Vicente Navarro, Maika Spanish Translator, Melbourne, Australia -
Dina Wiemann (Novartis, Switzerland) -
Lana Yeganova (NCBI/NLM/NIH, USA