***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)***
***Website: https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks ***
***Twitter: https://twitter.com/wiesp_nlp ***
A good amount of astrophysics research makes use of data coming from missions and facilities such as ground observatories in remote locations or space telescopes, as well as digital archives that hold large amounts of observed and simulated data. These missions and facilities are frequently
named after historical figures or use some ingenious acronym which, unfortunately, can be easily confused when searching for them in the literature via simple string matching. For instance, Planck can refer to the person, the mission, the constant, or several institutions. Automatically recognizing entities such as missions or facilities would help tackle this word sense disambiguation problem.
The shared task consists of Named Entity recognition (NER) on samples of text extracted from astrophysics publications. The labels were created by domain experts and designed to identify entities of interest to the astrophysics community. They range from simple to detect (ex: URLs) to highly unstructured (ex: Formula), and from useful to researchers (ex: Telescope) to more useful to archivists and administrators (ex: Grant). Overall 31 different labels are included, and their distribution is highly unbalanced (ex: ~100x more Citations than Proposals). Submissions will be scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity level, and scikit-learn's Matthews correlation coefficient method at the token level. We also encourage authors to propose their own evaluation metrics. A sample dataset and more instructions can be found at: https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks
Participants (individuals or groups) will have the opportunity to present their findings during the workshop and write a short paper. The best performant or interesting approaches might be invited to further collaborate with the NASA Astrophysics Data System ( https://ui.adsabs.harvard.edu/).
The DEAL shared task is a part of the *1st Workshop on Information Extraction from Scientific Publications (WIESP) at AACL-IJCNLP 2022: *
https://ui.adsabs.harvard.edu/WIESP/2022/
***Please fill in this form to report your intention to participate in the shared task***
https://forms.office.com/r/KKpeKJBLy3
***Shared Task Submission***
Link to data and scoring scripts: https://huggingface.co/datasets/fgrezes/WIESP2022-NER
CodaLab Link to the online competition : https://codalab.lisn.upsaclay.fr/competitions/5062
***Important Dates***
-
Training+Validation Data Release: June 1, 2022 -
Validation Phase: June 1 - July 31, 2022 -
Test Data Release: August 1, 2022 -
Final Scoring Period: August 1 - August 10, 2022 -
System Report Submission: August 25, 2022 -
Notification: September 25, 2022 -
Camera-ready Submission Deadline: October 10, 2022 -
Event Date: November 20, 2022 (online)
***All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)***
***Organizers***
-
Tirthankar Ghosal https://elitr.eu/tirthankar-ghosal, Charles University, CZ -
Sergi Blanco-Cuaresma https://www.blancocuaresma.com/s/, Center for Astrophysics | Harvard & Smithsonian, USA -
Alberto Accomazzi https://ui.adsabs.harvard.edu/about/team/team/aaccomazzi.html, Center for Astrophysics | Harvard & Smithsonian, USA -
Robert M. Patton https://www.ornl.gov/staff-profile/robert-m-patton, Oak Ridge National Laboratory, USA -
Felix Grezes https://ui.adsabs.harvard.edu/about/team/team/fgrezes.html, Center for Astrophysics | Harvard & Smithsonian, USA -
Thomas Allen https://ui.adsabs.harvard.edu/about/team/team/tallen.html, Center for Astrophysics | Harvard & Smithsonian, USA