5th Workshop on Scholarly Document Processing (SDP 2025) @ ACL 2025
Call for Papers
Dear colleagues – you are invited to participate in the 5th Workshop on Scholarly Document Processing (SDP 2025) to be held at ACL 2025 in Vienna, Austria. SDP 2025 will consist of a research track and five shared tasks. The call for research papers is described below, and more details can be found on our website, https://sdproc.org/2025/ https://sdproc.org/2025/.
Papers must follow the ACL format and conform to the ACL 2025 Submission Guidelines. Paper submission has to be done through OpenReview: https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc
Website: https://sdproc.org/2025/ https://sdproc.org/2025/ Submission site: https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc X (Twitter): https://twitter.com/sdpworkshop https://twitter.com/sdpworkshop Shared tasks: https://sdproc.org/2025/sharedtasks.html https://sdproc.org/2025/sharedtasks.html Paper submission deadline: March 1 (Saturday), 2025 Call for Research Papers Scholarly literature is the chief means by which scientists and academics document and communicate their results and is therefore critical to the advancement of knowledge and improvement of human well-being. At the same time, this literature poses challenges to NLP uncommon in other genres, such as specialized language and high background knowledge requirements, long documents and strong structural conventions, multimodal presentation, citation relationships among documents, an emphasis on rational argumentation, and the frequent availability of detailed metadata and experimental data. These challenges necessitate the development of NLP methods and resources optimized for this domain. The Scholarly Document Processing (SDP) workshop provides a venue for discussing these challenges, bringing together stakeholders from different communities including computational linguistics, machine learning, text mining, information retrieval, digital libraries, scientometrics and others, to develop methods, tasks, and resources in support of these goals.
This workshop builds on the success of prior workshops: SDP workshops held at EMNLP 2020, NAACL 2021, COLING 2022, and ACL 2024, and the 1st and 2nd SciNLP workshops held at AKBC 2020 and 2021. In addition to having broad appeal within the NLP community, we hope the SDP workshop will attract researchers from other relevant fields including meta-science, scientometrics, data mining, information retrieval, and digital libraries, bringing together these disparate communities within ACL.
Topics of Interest We invite submissions from all communities demonstrating usage of and challenges associated with natural language processing, information retrieval, and data mining of scholarly and scientific documents. Relevant topics include (but are not limited to):
Large Language Models (LLMs) for science Representation learning and language modeling Information extraction and NER Document understanding Summarization and generation Question-answering Discourse modeling/argumentation mining Network analysis Bibliometrics, scientometrics, and altmetrics Reproducibility and research integrity, including new challenges posed by generative AI Peer review tools, principles and technology Metadata and indexing Inclusion of datasets and computational resources Research infrastructures and digital libraries Increasing the representation in scholarly work of disadvantaged populations LLM-based interfaces to consume/produce scholarly documents Impact of scholarly communication on popular discourse Submission Information Authors are invited to submit full and short papers with unpublished, original work. Submissions will be subject to a double-blind peer-review process. Accepted papers will be presented by the authors at the workshop either as a talk or a poster. All accepted papers will be published in the workshop proceedings (proceedings from previous years can be found here:https://aclanthology.org/venues/sdp/ https://aclanthology.org/venues/sdp/), which will be published in the ACL Anthology.
The submissions must be in PDF format and anonymized for review. All submissions must be written in English and follow the ACL 2025 formatting requirements:
Long paper submissions: up to 8 pages of content, plus unlimited references. Short paper submissions: up to 4 pages of content, plus unlimited references.
Submission Website: Paper submission has to be done through openreview: https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/SDProc
Final versions of accepted papers will be allowed 1 additional page of content so that reviewer comments can be taken into account. Important Dates (Main Research Track) First call for workshop papers: December 19, 2024 Second call for workshop papers: January 24, 2025 Third call for workshop papers: February 24, 2025 Paper submission deadline: March 1, 2025 Pre-reviewed (ARR) submission deadline: March 25, 2025 Notification of acceptance: April 17, 2025 Camera-ready paper due: May 16, 2025 Workshop dates: July 31 – August 1, 2025 Note: Shared task submission deadlines and other important dates to be announced.
SDP 2024 Keynote Speakers We are excited to have several keynote speakers at SDP 2025.
Tom Hope https://tomhoper.github.io/, Assistant Professor at Hebrew University of Jerusalem and Research Scientist at Allen Institute for AI. James A. Evans https://sociology.uchicago.edu/directory/James-A-Evans, Professor and Director of the Knowledge Lab at University of Chicago and External Professor at the Santa Fe Institute. TBA SDP 2025 Shared Tasks SDP 2025 will host five exciting shared tasks. More information about all shared tasks is provided on the workshop website:https://sdproc.org/2025/sharedtasks.html https://sdproc.org/2025/sharedtasks.html
Detecting automatically generated scientific papers (DAGPap 25) A big problem with the ubiquity of Generative AI is that it has now become very easy to generate fake scientific papers. This can erode public trust in science and attack the foundations of science: are we standing on the shoulders of robots? The Detecting Automatically Generated Papers (DAGPAP) competition aims to encourage the development of robust, reliable AI-generated scientific text detection systems, utilizing a diverse dataset and varied machine learning models in a number of scientific domains. Organizers: Savvas Chamezopoulos, Dan Li, Anita de Waard (Elsevier).
Contextualizing Scientific Figures and Tables (Context 25) Interpreting scientific claims in the context of empirical findings is a valuable practice, yet extremely time-consuming for researchers. Such interpretation requires identifying key results (often captured in tables and figures) that provide supporting evidence from research papers, and contextualizing these results with associated methodological details (e.g., measures, sample, etc.). During the previous version of this shared task in 2024, we released datasets to support the development of methods for automatic identification of key result figures or tables as well as additional grounding context to make claim interpretation more efficient. However, the released datasets contained tables and images already extracted from the scientific papers to allow participants to bypass PDF pre-processing issues. In Context 2025, given recent advances in multimodal LLMs, we plan to extend the difficulty of this task by requiring participants to identify key results from paper PDFs directly, and add a new sub-task on multi-hop reasoning over scientific evidence. Organizers: Joel Chan, Matthew Akamatsu, Aakanksha Naik
Scientific Visual Question Answering (SciVQA) Scholarly articles convey valuable information not only through unstructured text but also via (semi-)structured figures such as charts and diagrams. Automatically interpreting the semantics of knowledge encoded in these figures can be beneficial for downstream tasks such as question answering (QA). In the SciVQA challenge, the participants will develop multimodal systems capable of efficiently processing both visual (i.e., addressing attributes such as colour, shape, size, etc.) and non-visual QA pairs based on images of scientific figures and their captions. Organizers: Ekaterina Borisova, Georg Rehm
Scientific Fact-checking of Social Media Posts on Climate Change (ClimateCheck) The ClimateCheck shared task focuses on fact-checking claims from social media about climate change against peer-reviewed scholarly articles. Participants will retrieve relevant publications from a corpus of 400,000 climate research articles and classify each abstract as supporting, refuting, or not having enough information about the claim. Training data will include human-annotated claim-publication pairs, and the evaluation will combine nDCG@K and Bpref for retrieval and F1 score for classification. The task aims to develop models that link social media claims to scientific evidence, promoting informed and evidence-based discussions on climate change. Organizers: Raia Abu Ahmad, Georg Rehm
Software Mention Detection in Scholarly Publications (SOMD 2) Software plays an essential role in computational research methods and is considered one of the crucial entities in scholarly documents. However, software mentions are not always cited in academic documents, resulting in various informal mentions of software across a paper. Automatic identification of such software mention contributes to the better understanding, accessibility, and reproducibility of the research work. In addition to the mention of software, to understand the research context, it is necessary to understand the purpose of a software mention and its attributes, making software mention detection a comprehensive task. We are extending our first iteration of the shared task SOMD 2024 https://nfdi4ds.github.io/nslp2024/docs/somd_shared_task.html with new challenges. In addition to information extraction techniques, our extended focus would be on Joint Named Entity and Relation Classification techniques. Organizers: Sharmila Upadhyaya, Frank Krueger, Stefan Dietze
Organizing Committee Tirthankar Ghosal, Oak Ridge National Laboratory, USA Philipp Mayr, GESIS – Leibniz Institute for the Social Sciences, Germany Aakanksha Naik, Allen Institute for AI, USA Amanpreet Singh, Allen Institute for AI, USA Anita de Waard, Elsevier, Netherlands Dayne Freitag, SRI International, USA Georg Rehm, German Research Center for Artificial Intelligence (DFKI), Germany Sonja Schimmler, Fraunhofer FOKUS, Germany Dan Li, Elsevier, Netherlands