*Task:* We call for automated systems to extract and normalize the findings of dysmorphology physical examinations. The dataset consists of 3136 de-identified observations with dysmorphic findings manually annotated and normalized with their corresponding HumanPhenotype Ontology https://hpo.jax.org/app/ (HPO) terms.
*Motivation:* Dysmorphology physical examinations catalog minor morphological differences of patients’ bodies and may also identify general medical signs such as neurologic dysfunction. These findings enable correlations of patients with known rare genetic diseases and allow researchers to delineate undescribed genetic conditions. These medical findings are nearly always captured as unstructured free text within the electronic health record, making them unavailable for downstream computational analysis. Advanced natural language processing methods are therefore required to retrieve the information from the records.
*Challenge:* Both extraction and normalization are challenging. The extraction is challenging due to the descriptive style of the examinations which, for conciseness, report findings with disjoint and overlapping mentions. The normalization is challenging due to the large scale of the HPO ontology which requires a normalizer to learn the task without supervision since our training set does not provide examples of all terms in the HPO.
See https://biocreative.bioinformatics.udel.edu/tasks/biocreative-viii/track-3/ for details., in short:
- 3136 de-identified observations with dysmorphic and normal findings manually annotated and normalized with their corresponding Human Phenotype Ontology https://hpo.jax.org/app/ terms
- Baseline systems available (e.g. doc2HPO https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fdoi.org%2F10.1093%2Fnar%2Fgkz386__%3B!!KOmnBZxC8_2BBQ!wOul5WmKEXAz3ieVMFnkWsnE22f7qVws_GT94mj2AxE_p9hY_nBY3f4pCJT10h7WmZyFYl5nLY7QhOPrSRJMmoBx7To%24&data=05%7C01%7CCAMPBELLIM%40chop.edu%7C29f04983ec7343b4f41708db3b8c57be%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638169246199221115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hux0LlF4U0GT6HWpO%2FY8JjqYLWB6WrkSMcl7RPGlF08%3D&reserved=0 , NeuralCR https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fdoi.org%2F10.2196%2F12596__%3B!!KOmnBZxC8_2BBQ!wOul5WmKEXAz3ieVMFnkWsnE22f7qVws_GT94mj2AxE_p9hY_nBY3f4pCJT10h7WmZyFYl5nLY7QhOPrSRJMNZ4HF7s%24&data=05%7C01%7CCAMPBELLIM%40chop.edu%7C29f04983ec7343b4f41708db3b8c57be%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638169246199221115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=B2pqq50tjZ1QJtfESjbiiequC%2BGte1b%2BrxPQ3%2BrjAd0%3D&reserved=0 , PhenoTagger https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fdoi.org%2F10.1093%2Fbioinformatics%2Fbtab019__%3B!!KOmnBZxC8_2BBQ!wOul5WmKEXAz3ieVMFnkWsnE22f7qVws_GT94mj2AxE_p9hY_nBY3f4pCJT10h7WmZyFYl5nLY7QhOPrSRJMn6mBH0w%24&data=05%7C01%7CCAMPBELLIM%40chop.edu%7C29f04983ec7343b4f41708db3b8c57be%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638169246199221115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=b527AtyQQb6mRMU8KSdw7L2APgTzM5Zf6ESNax9VO%2B4%3D&reserved=0 , PhenoBERT https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fdoi.org%2F10.1109%2FTCBB.2022.3170301__%3B!!KOmnBZxC8_2BBQ!wOul5WmKEXAz3ieVMFnkWsnE22f7qVws_GT94mj2AxE_p9hY_nBY3f4pCJT10h7WmZyFYl5nLY7QhOPrSRJMHtsRXdg%24&data=05%7C01%7CCAMPBELLIM%40chop.edu%7C29f04983ec7343b4f41708db3b8c57be%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638169246199221115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=V%2BJKQHfBB7Jj6LqzwzAE7bIJ0NWitzhILOpekgbMf9w%3D&reserved=0, and txt2HPO https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fgithub.com%2FGeneDx%2Ftxt2hpo__%3B!!KOmnBZxC8_2BBQ!wOul5WmKEXAz3ieVMFnkWsnE22f7qVws_GT94mj2AxE_p9hY_nBY3f4pCJT10h7WmZyFYl5nLY7QhOPrSRJMeawcndc%24&data=05%7C01%7CCAMPBELLIM%40chop.edu%7C29f04983ec7343b4f41708db3b8c57be%7Ca611241607b041a59bb1d146b575c975%7C0%7C0%7C638169246199221115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mtCRgl7GcWjWcSArggA%2FbhlbAnDpwZtty0reoUuDrWI%3D&reserved=0 )
- Codalab opened at https://codalab.lisn.upsaclay.fr/competitions/11351
- Evaluation period: Sept. 15, 9:00 UTC - Sept. 18, 23:59 UTC
[Apologies for cross-posting]
Best regards, Davy