Second Call for Interest: DISRPT 2025 Shared Task on Discourse Relation Parsing and Treebanking - Corpora

19 May 2025

      🚀 Second Call for Interest: DISRPT 2025 Shared Task on Discourse Relation Parsing and Treebanking. 
🛎️ sample data has been released! 
In conjunction with CODI-CRAC & EMNLP 2025 - Suzhou, China, Nov. 5-9.
This year, we are organizing the fourth edition of the DISRPT shared task on discourse processing across formalisms, for a variety of languages and genres, with three subtasks:

* Task 1: Discourse segmentation
* Task 2: Connective identification
* Task 3: Relation classification

We will provide training, development and test datasets from (almost) all available languages  in RST / eRST, SDRT, PDTB, ISO 24617, and discourse dependencies, using a uniform format. Because different corpora, languages, and frameworks use different guidelines, the shared task will promote the design of flexible methods for dealing with various guidelines, and will help to push forward the discussion of converging standards for discourse units. For datasets which have treebanks, we will evaluate segmentation in two different scenarios: with and without gold syntax. An automatically parsed version is provided for all corpora without a gold parse. 

This year, the shared task will feature: 
 * The inclusion of more frameworks, with datasets from: RST / eRST, SDRT, PDTB, ISO 24617, and discourse dependencies * The inclusion of new corpora and new languages, some of them kept a surprise! * A unified set of labels for the discourse relations, to make easier the evaluation across datasets * A new constraint: only one multilingual model should be submitted per task, and it should be small! This will make our replication work easier, but more importantly, it will simplify using such a model and test the robustness of your solution. 
Today, we’re excited to announce the release of the sample data for the DISRPT 2025 Shared Task! You can now access the data, format documentation, and tools on our GitHub 🔗 https://github.com/disrpt/sharedtask2025
The sample covers five discourse frameworks — RST / eRST, PDTB, SDRT, and Discourse Dependencies — across 12 languages: English, Basque, French, Dutch, Italian, Portuguese, Spanish, Frasi, Chinese, Russian, Turkish, and Thai.
We invite researchers and teams interested in participating to register now. Registered participants will be added to our mailing list and receive all future updates.
📅 The full training data will be released on June 16, 2025 — stay tuned!
To join the mailing list and stay informed, please email us at:
📧 disrpt_chairs@googlegroups.com 
Let us know you're interested — we’d love to have you on board!
**Important dates**

 * May 16 2025 – Sample data release [NOW] * June 16 2025 – Training data release * July 14 2025 – Test data release * August 1 2025 – System + paper submissions due * September 12 2025 – Notification of acceptance * September 19 2025 – Camera ready papers * November 8-9 2025 – CODI at EMNLP
All deadlines are 11.59 pm UTC -12h (AoE, "Anywhere on Earth").

**Information:**

Contact the organizers: disrpt_chairs@googlegroups.com 
Official website: https://sites.google.com/view/disrpt2025/
Google group for participants, please join us on: disrpt2025_participants@googlegroups.com

**Organization:**

Peter Bourgonje (Universität Potsdam, Germany)
Chloé Braud (CNRS - IRIT, University of Toulouse, France)
Chuyuan Li (University of British Columbia, Canada)
Janet Yang Liu (LMU Munich, Germany)
Philippe Muller (CNRS - University of Toulouse, France)
Amir Zeldes (Georgetown University, Washington DC, USA)