*** First Workshop on Information Extraction from Scientific Publications ( WIESP) at AACL-IJCNLP 2022 ***
*** Website: https://ui.adsabs.harvard.edu/WIESP/ *** Twitter: https://twitter.com/wiesp_nlp
The number of scientific papers published per year has exploded in recent years. Indexing the article's full text in search engines helps discover and retrieve vital scientific information to continue building on the shoulders of giants, informing policy, and making evidence-based decisions. Nevertheless, it is difficult to navigate this ocean of data. Using simple string matching has substantial limitations: human language is ambiguous in nature, context matters, and we frequently use the same word and acronyms to represent a multitude of different meanings. Extracting structured and semantically relevant information from scientific publications (e.g., named-entity recognition, summarization, citation intention, linkage to knowledge graphs) allows for better selection and filter articles.
The First Workshop on Information Extraction from Scientific Publications ( WIESP) will create the necessary forum to foster discussion and research using Natural Language Processing and Machine Learning. WIESP would specifically focus on topics related to information extraction from scientific publications, including (but not limited to):
- Scientific document parsing - Scientific named-entity recognition - Scientific article summarization - Question-answering on scientific articles - Citation context/span extraction - Structured information extraction from full-text, tables, figures, bibliography - Novel datasets curated from scientific publications - Argument extraction and mining - Challenges in information extraction from scientific articles - Building knowledge graphs via mining scientific literature; querying scientific knowledge graphs - Novel tools for IE on scientific literature and interaction with users - Mathematical information extraction - Scientific concepts, facts extraction - Visualizing scientific knowledge - Bibliometric and Altmetric studies via information extraction from scientific articles and metadata - Information extraction from COVID-19 articles to inform public health policy
In addition to research paper presentations, WIESP would also feature keynote talks, a panel discussion, and a shared task. We will update the details on our website as and when they become available. We especially welcome participation from academic and research institutions, government and industry labs, publishers, and information service providers. Projects and organizations using NLP/ML techniques in their text mining and enrichment efforts are also welcome to participate.
***Call for Papers***
We invite papers of the following categories:
***Long papers*** must describe substantial, original, completed, and unpublished work. Wherever appropriate, concrete evaluation and analysis should be included. Papers must not exceed eight (8) pages of content, plus unlimited pages of references. The final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account.
***Short papers*** must describe original and unpublished work. Please note that a short paper is not a shortened long paper. Instead, short papers should have a point that can be made in a few pages, such as a small, focused contribution, a negative result, or an interesting application nugget. Short papers must not exceed four (4) pages, plus unlimited pages of references. The final versions of short papers will be given one additional page of content (up to 5 pages) so that reviewers' comments can be taken into account. ***Position papers*** will give voice to authors who wish to take a position on a topic listed above or the field of scholarly information extraction. Submissions need not present original work and should be two to four pages in length, including title, text, figures and tables, and references.
***Demo papers*** should be no more than four (4) pages in length, including references, and should describe implemented systems that are of relevance to the theme of the workshop. Authors of demo papers should be willing to present a demo of their system during WIESP at AACL-IJCNLP 2022.
***Extended Abstracts*** We welcome submissions of extended abstracts (2 pages max) related to the research topics mentioned above. Submissions may include previously published results, late-breaking results, or a description of ongoing projects in the broad field of information extraction and mining from scientific publications. Extended abstracts can also summarize existing work, work in progress, or a collection of works under a unified theme (e.g., a series of closely related papers that build on each other or tackle a common problem).
***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)***
A good amount of astrophysics research makes use of data coming from missions and facilities such as ground observatories in remote locations or space telescopes, as well as digital archives that hold large amounts of observed and simulated data. These missions and facilities are frequently named after historical figures or use some ingenious acronym which, unfortunately, can be easily confused when searching for them in the literature via simple string matching. For instance, Planck can refer to the person, the mission, the constant, or several institutions. Automatically recognizing entities such as missions or facilities would help tackle this word sense disambiguation problem.
The shared task consists of Named Entity Recognition (NER) on samples of text extracted from astrophysics publications. The labels were created by domain experts and designed to identify entities of interest to the astrophysics community. They range from simple to detect (ex: URLs) to highly unstructured (ex: Formula), and from useful to researchers (ex: Telescope) to more useful to archivists and administrators (ex: Grant). Overall, 31 different labels are included, and their distribution is highly unbalanced (ex: ~100x more Citations than Proposals). Submissions will be scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity level and scikit-learn's Matthews correlation coefficient method at the token level. We also encourage authors to propose their own evaluation metrics. A sample dataset and more instructions can be found at:
https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks
Participants (individuals or groups) will have the opportunity to present their findings during the workshop and write a short paper. The best performant or interesting approaches might be invited to further collaborate with the NASA Astrophysical Data System ( https://ui.adsabs.harvard.edu/).
***Important Dates***
- Paper/Abstract Submission Deadline: September 2, 2022 - Notification of workshop paper/abstract acceptance: September 25, 2022 - Camera-ready Submission Deadline: October 10, 2022 - Workshop: November 20, 2021 (online)
***All submission deadlines are 11.59 pm UTC -12h ("Anywhere on Earth")***
***Submission Website and Format***
Submission Link: softconf.com/aacl2022/WIESP
Submission will be via softconf. Submissions should follow the ACLPUB formatting guidelines (https://acl-org.github.io/ACLPUB/formatting.html) and template files (https://github.com/acl-org/acl-style-files/tree/master). Submissions (Long and Short Papers) will be subject to a double-blind peer-review process. Position papers, Demo papers, and Extended Abstracts need not be anonymized. The authors will present accepted papers at the workshop either as a talk or a poster. All accepted papers will be published in the workshop proceedings.
We follow the same policies as AACL-IJCNLP 2022 regarding preprints and double submissions. The anonymity period for WIESP 2022 is from July 15 to September 25.
***Organizers***
- Tirthankar Ghosal, Charles University, CZ - Sergi Blanco-Cuaresma, Center for Astrophysics | Harvard & Smithsonian, USA - Alberto Accomazzi, Center for Astrophysics | Harvard & Smithsonian, USA - Robert M. Patton, Oak Ridge National Laboratory, USA - Felix Grezes, Center for Astrophysics | Harvard & Smithsonian, USA - Thomas Allen, Center for Astrophysics | Harvard & Smithsonian, USA