BioCreative VIII Challenge and Workshop 2nd Call for Participation
Where, When: The BioCreative VIII workshop<BioCreative%20VIII%20workshop> will run with AMIA 2023, November 11-15, 2023, In New Orleans, LA.
BioCreative VIII: The VIIIth BioCreative workshop seeks to attract researchers interested in automatic methods of extracting medically relevant information from clinical data and aims to bring together the medical NLP community and the health professionals community. The challenge tracks include:
* BioRED (Biomedical Relation Extraction Dataset) Track will continue to address information extraction from biomedical literature * SYMPTEMIST (Symptom TExt Mining Shared Track) will focus on symptom extraction from clinical records in Spanish and multilingual corpus * Phenotype extraction (genetic conditions in pediatric patients) Track will address phenotype extraction from clinical records * Annotation Tool Track will focus on annotation tools that facilitate the job of domain experts by offering seamless integration with relevant ontologies and other features to improve efficiency (dataset provided).
Workshop Proceedings and Special Issue: The BioCreative VIII Proceedings will host all the submissions from participating teams, and it will be freely available by the time of the workshop. In addition, we are happy to announce that the journal Database will host the BioCreative VIII special issue for work that has passed their peer-review process. Invitation to submit will be sent after the workshop.
Participation: Teams can participate in one or more of these tracks. Team registration will continue until final commitment is requested by the individual tracks. To register a team go to the Registration formhttps://urldefense.com/v3/__https:/forms.gle/cwEPevGPjrjm687z5__;!!KOmnBZxC8_2BBQ!3qThN96vjtSn1RncSdJvErJEL_mVPAQxonbHa80lTGc5HdtDyNTjaMd2UQieaXT-h-agKlz22LvlkXaNpp4WEoCcpIqd3jE$. If you have restrictions accessing Google forms please send e-mail to BiocreativeChallenge@gmail.commailto:BiocreativeChallenge@gmail.com
BioCreative VIII Tracks:
Track 1: BioRED (Biomedical Relation Extraction Dataset) Track. (Rezarta Islamaj and Zhiyong Lu)
This track aims to foster the development of systems that automatically extract biomedical relations in journal articles, and the final resource -- freely available to the community -- will consist of 1000 MEDLINE articles fully annotated with biological and medically relevant entities, biomedical relations between them, and the novelty of the relation (whether the relation is a key point of the article versus background knowledge that can be found elsewhere). The participants will use the training data (600 articles) to design and develop their NLP systems to extract asserted relationships from free text and are encouraged to classify relations that are novel findings. In the BioCreative setting we will enrich the BioRED training dataset with 400 recently published MEDLINE articles fully annotated, bringing this valuable resource to 1000 articles. This track serves as a continuation of previous BioCreative Workshops that addressed the individual extraction of bio entities and/or specific relations such as disease-gene, protein-protein, or chemical-chemical, in biomedical articles. In contrast from previous challenges, this track calls for the extraction of all semantic relations expressed in the article and their novelty factor.
Track 2: SYMPTEMIST (Symptom TExt Mining Shared Task) (Martin Krallinger)
A considerable effort has been made to automatically extract from clinical texts relevant variables and concepts using advanced entity recognition approaches. Despite the importance of clinical signs and symptoms for diagnosis, prognosis and healthcare data analytics strategies, this kind of clinical entity has received far less attention when compared to other entity classes such as medications or diseases. To understand and characterize relationships between different symptoms, their onset, or associations of symptoms to diseases is a central question for medical research. Due to the complexity underlying the annotation process and normalization or mapping of symptom mentions to controlled vocabularies, very few datasets or corpora have been generated to train and evaluate advanced clinical named entity recognition systems. To foster the development, research and evaluation of semantic annotation strategies that can be useful for systematically extracting and harmonizing symptoms from clinical documents we propose the SYMPTEMIST track. We will invite researchers, health-tech professionals, NLP, and ontology experts to develop tools capable of detecting automatically mentions of clinical symptoms from clinical texts in Spanish and normalizing or mapping them to a widely used multilingual clinical vocabulary, namely SNOMED CT. For this task we will release a large collection of manually annotated symptoms mentions, together with detailed annotation guidelines, consistency analysis and additional resources. For this track we plan also to release a multilingual version of the corpus (English, Italian, Romanian, Catalan, Portuguese, French, Dutch, Swedish and Czech). This is a new challenge.
Track 3: Phenotype extraction (genetic conditions in pediatric patients) (Graciela Gonzalez, Ian Campbell, Davy Weissenbacher)
The dysmorphology physical examination is a critical component of the diagnostic evaluation in clinical genetics. This process catalogs often minor morphological differences of the patient's facial structure or body, but it may also identify more general medical signs such as neurologic dysfunction. The findings enable the correlation of the patient with known rare genetic diseases. Although the medical findings are key information, they are nearly always captured within the electronic health record (EHR) as unstructured free text, making them unavailable for downstream computational analysis. Advanced Natural Language Processing methods are therefore required to retrieve the information from the records. This is a new challenge.
Track 4: Annotation Tool track (Rezarta Islamaj, Cecilia Arighi, Lynette Hirschman, Martin Krallinger, Graciela Gonzalez)
Recognizing the need for freely available, time-saving tools that help build quality gold-standard resources, the goal of BioCreative 2023 Annotation Tool Track is to foster development of such biocuration annotation systems. This track calls for text mining developers to submit systems that are: 1) both publicly available, and offer local setup options to allow for data with privacy concerns, such as clinical records, 2) able to support team annotation, and collaboration between annotators to ensure data annotation quality, 3) able to annotate documents for triage, entities, and/or relations, and 4) able to integrate the selected ontology, and provide search capabilities/browsing, as well as suggestions to the curator for the selected ontology. A select number of systems will be showcased at the workshop.
Organizing Committee * Dr. Rezarta Islamaj, National Library of Medicine * Dr. Cecilia Arighi, University of Delaware * Dr. Ian M. Campbell, Children Hospital of Philadelphia * Dr. Graciela Gonzalez-Hernandez, Cedars-Sinai Medical Center * Dr. Lynette Hirschman, MITRE * Dr. Martin Krallinger, Barcelona Supercomputing Center * Dr. Davy Weissenbacher, Cedars-Sinai Medical Center * Dr. Zhiyong Lu, National Library of Medicine