We are delighted to announce that we have released the consolidated version of the Chilean Waiting List corpus. This dataset comprises 9,000 clinical referrals in Spanish, annotated with ten entity types (almost half nested), relations, and attributes. For more details, refer to the papers published at ACM Healthcare (https://lnkd.in/dJskpprV) and EMNLP conference ( https://lnkd.in/dPt6RFsj). The corpus is available through the following resources:
1. Zenodo (https://lnkd.in/dWfF_Cj6): Here, we make available the corpus in its original version (the referrals in text file format and the annotations following the standoff format). In addition, we transformed these files into the CoNLL format, which is the most suitable format for performing NER experiments.
2. Papers with code (https://lnkd.in/dsAw3Npt): This page contains the benchmark of the dataset, including references to the NER models tested to date. In particular, we published our corpus’s first results regarding the Nested Named Entity Recognition task. The results were published at COLING’s main conference. Please refer to the following link: https://lnkd.in/dHnnA3aV.
3. Hugging Face (https://huggingface.co/plncmm): To facilitate the testing of transformer-based models, we have made available 7 NER datasets in Huggingface, one for each entity type (disease, medication, body part, finding, abbreviation, family member, and procedure). Here is a simple notebook of how to load these datasets: https://lnkd.in/dVddWXux.