*Task:* We call for automated systems to extract and normalize the findings in dysmorphology physical examinations. The dataset consists of 3136 de-identified observations with dysmorphic findings manually annotated and normalized with their corresponding Human Phenotype Ontology (HPO) terms. Both extraction and normalization are challenging. The extraction due to the descriptive style of the examinations which, for conciseness, report findings with disjoint and overlapping mentions. The normalization due to the large scale of the HPO ontology which requires a normalizer to learn the task without supervision since our training set does not provide examples of all terms in the HPO. See https://biocreative.bioinformatics.udel.edu/tasks/biocreative-viii/track-3/ for details.
*Motivation:* The dysmorphology physical examination catalogs minor morphological differences in patients’ bodies and may identify general medical signs such as neurologic dysfunction. Its findings enable correlations of patients with known rare genetic diseases and allow researchers to delineate undescribed genetic conditions. These medical findings are nearly always captured as unstructured free text within the electronic health record, making it unavailable for downstream computational analysis. Advanced Natural Language Processing methods are therefore required to retrieve the information from the records.
*In short: * • 3136 de-identified observations with dysmorphic and normal findings manually annotated and normalized with their corresponding Human Phenotype Ontology terms • Baseline systems available (e.g. doc2HPO, NeuralCR, PhenoTagger, PhenoBERT, and txt2HPO) • Codalab opened at https://codalab.lisn.upsaclay.fr/competitions/11351 • Evaluation period: Sept. 15, 9:00 UTC - Sept. 18, 23:59 UTC
[Apologies for cross-posting]
Best regards, Davy