Dear colleagues,
We are writing to invite your collaboration in a community-driven initiative to develop annotation schemas for scientific process descriptions in research articles. The effort is inspired by the spirit of schema.org https://schema.org/ , but focuses specifically on capturing experimental and simulation workflows across scientific domains. The resulting schemas will be openly published as templates in the Open Research Knowledge Graph (ORKG, https://orkg.org/ https://orkg.org/) and will form the basis of a paper planned for Nature Scientific Data https://www.nature.com/sdata/ .
Motivation
Scientific papers describe complex processes-e.g., ALD and CVD in materials science, PCR and CRISPR in molecular biology, tensile and fatigue testing in engineering, leaching experiments in environmental science, RCTs and cognitive tasks in psychology-using highly variable narrative text. This variability makes it difficult to:
* design consistent, interoperable annotation guidelines, * build cross-domain corpora of scientific methods, * compare and align experimental setups across papers, and * create FAIR, reusable metadata about how studies are actually carried out.
Our goal is to define annotation schemas for these processes (inputs, conditions, outputs, roles, and relations) and to populate them from full-text articles. These schemas and resulting corpora are intended as shared resources for corpus linguistics, NLP, scientific text mining, and downstream applications.
Why Collaborate
We are seeking contributors who can:
* provide collections of full-text articles (~50+) describing a specific experimental or simulation process in their field, * offer expert feedback on automatically mined process schemas, or * run the schema-miner workflow themselves (with our support) and help refine the resulting schema.
Individual or small-team participation is welcome, and co-authorship opportunities are available depending on involvement.
A wide variety of processes can be included-thin-film deposition, synthetic chemistry reactions, gene editing workflows, fatigue testing, soil leaching experiments, drug dissolution assays, fMRI tasks, cognitive experiments, and many more. A broader (non-exhaustive) list is here: https://docs.google.com/document/d/1iyL1l9vCXhnQ0To7j79vlr-pW4JvPlQC95svygq RDfg/edit https://docs.google.com/document/d/1iyL1l9vCXhnQ0To7j79vlr-pW4JvPlQC95svygqR Dfg/edit
How to Participate
Please register your interest using this short form: https://forms.gle/9WEdouw4yMyNHcn19 https://forms.gle/9WEdouw4yMyNHcn19
We will notify selected contributors by January 31, 2026. Data collection and schema mining will conclude by April 30, 2026, followed by manuscript preparation.
We hope members of this community will consider contributing to this effort to develop shared annotation schemas and corpora of scientific process descriptions-a step toward more comparable, analyzable, and reusable scientific text resources. Also please help us spread the word!
Best regards, Jennifer D'Souza TIB - Leibniz Information Centre for Science and Technology (on behalf of the schema-miner coordination team)