[Corpora-List] The Chilean Waiting List corpus (Dataset for NER in Spanish)

24 Jan 2023


      We are delighted to announce that we have released the consolidated version
of the Chilean Waiting List corpus. This dataset comprises 9,000 clinical
referrals in Spanish, annotated with ten entity types (almost half nested),
relations, and attributes. For more details, refer to the papers published
at ACM Healthcare (https://lnkd.in/dJskpprV) and EMNLP conference (
https://lnkd.in/dPt6RFsj). The corpus is available through the following
resources:
1. Zenodo (https://lnkd.in/dWfF_Cj6): Here, we make available the corpus in
its original version (the referrals in text file format and the annotations
following the standoff format). In addition, we transformed these files
into the CoNLL format, which is the most suitable format for performing NER
experiments.
2. Papers with code (https://lnkd.in/dsAw3Npt): This page contains the
benchmark of the dataset, including references to the NER models tested to
date. In particular, we published our corpus’s first results regarding the
Nested Named Entity Recognition task. The results were published at
COLING’s main conference. Please refer to the following link:
https://lnkd.in/dHnnA3aV.
3. Hugging Face (https://huggingface.co/plncmm): To facilitate the testing
of transformer-based models, we have made available 7 NER datasets in
Huggingface, one for each entity type (disease, medication, body part,
finding, abbreviation, family member, and procedure). Here is a simple
notebook of how to load these datasets: https://lnkd.in/dVddWXux.

2025

2024

2023

2022

[Corpora-List] The Chilean Waiting List corpus (Dataset for NER in Spanish)