Third call for participation: DISRPT 2025 Shared Task on Discourse Relation Parsing and Treebanking - Corpora

4 Jul 2025

      🚀 Call for Participation: DISRPT 2025 Shared Task on Discourse Relation Parsing and Treebanking. 
🛎️ training data has been released and the submission is now open! 
https://softconf.com/emnlp2025/disrpt2025/
In conjunction with CODI-CRAC & EMNLP 2025 - Suzhou, China, Nov. 5-9.
This year, we are organizing the fourth edition of the DISRPT shared task on discourse processing across formalisms, for a variety of languages and genres, with three subtasks:

* Task 1: Discourse segmentation
* Task 2: Connective identification
* Task 3: Relation classification

We will provide training, development and test datasets from (almost) all available languages  in RST / eRST, SDRT, PDTB, ISO 24617, and discourse dependencies, using a uniform format. Because different corpora, languages, and frameworks use different guidelines, the shared task will promote the design of flexible methods for dealing with various guidelines, and will help to push forward the discussion of converging standards for discourse units. We will evaluate segmentation and connective detection in two different scenarios: with and without gold syntax. An automatically parsed version is provided for all corpora without a gold parse. 

This year, the shared task will feature: 
 * The inclusion of more frameworks, with datasets from: RST / eRST, SDRT, PDTB, ISO 24617, and discourse dependencies * The inclusion of new corpora and new languages, some of them kept a surprise! * A unified set of labels for the discourse relations, to make easier the evaluation across datasets * A new constraint: only one multilingual model should be submitted per task, and it should be small (4B parameters max)! This will make our replication work easier, but more importantly, it will simplify using such a model and test the robustness of your solution. 
We’re excited to announce the release of the training data for the DISRPT 2025 Shared Task! You can now access the data, format documentation, and tools on our GitHub 🔗 https://github.com/disrpt/sharedtask2025
The data covers five discourse frameworks — RST / eRST, PDTB, SDRT, and Discourse Dependencies — across 14 languages: Basque, Chinese, Czech, Dutch, English, Farsi, French, German, Italian, Portuguese, Russian, Spanish, Thai and Turkish Thai.
We invite researchers and teams interested in participating to register now. Registered participants will be added to our mailing list and receive all future updates.
📅 The full testing data will be released on July 14, 2025 — stay tuned!
To join the mailing list and stay informed, please email us at:
📧 disrpt_chairs@googlegroups.com 
Let us know you're interested — we’d love to have you on board!
**Important dates**

 * May 16 2025 – Sample data release * June 17 2025 – Training data release [NOW] * July 14 2025 – Test data release * August 1 2025 – System + paper submissions due * September 12 2025 – Notification of acceptance * September 19 2025 – Camera ready papers * November 8-9 2025 – CODI at EMNLP
All deadlines are 11.59 pm UTC -12h (AoE, "Anywhere on Earth").

**Information:**

Contact the organizers: disrpt_chairs@googlegroups.com 
Official website: https://sites.google.com/view/disrpt2025/
Google group for participants, please join us on: disrpt2025_participants@googlegroups.com

**Organization:**

Chloé Braud (CNRS - IRIT, University of Toulouse, France)
Chuyuan Li (University of British Columbia, Canada)
Janet Yang Liu (LMU Munich, Germany)
Philippe Muller (CNRS - University of Toulouse, France)
Amir Zeldes (Georgetown University, Washington DC, USA)