Launch of the Parsed Corpus of Southern Dutch Dialects (GCND) - Corpora

4 Feb 2025


      We are excited to announce the release of the first parsed corpus of spoken Dutch dialects, the Gesproken Corpus van de zuidelijk-Nederlandse Dialecten (GCND). This resource offers extensive data for linguistic research and is now accessible online.
Corpus Highlights:
* Speakers: 1,206 individuals, with the eldest born in 1871.
* Geographical Coverage: 639 distinct locations.
* Audio Data: Over 430 hours of recordings across 650 sessions.
* Transcriptions: Over 600 time-aligned, highly detailed transcriptions.
* Total Tokens: Approximately 4.77 million.
* GrETEL Treebank: 50,111 verified sentences and 452,459 verified tokens.
These figures represent the corpus as of its initial release. Ongoing efforts, supported by additional funding (GCND+), aim to expand the corpus with more transcriptions, including northern dialects from the Meertens Institute collection, and to enhance grammatical annotations. The latest updates are available through the corpus application.
Access Information:
The GCND is available online
* GCND corpus application (requires CLARIN login): https://gcnd.ivdnt.orghttps://gcnd.ivdnt.org/
* GCND project website: https://www.gcnd.ugent.be/
Acknowledgments:
This project was made possible through the funding of the Research Foundation Flanders and the dedicated efforts of numerous student assistants, volunteers and our project partners.
The GCND team (at Ghent University):
Anne Breitbarth (anne.breitbarth@ugent.bemailto:anne.breitbarth@ugent.be)
Anne-Sophie Ghyselen (annesophie.ghyselen@ugent.bemailto:annesophie.ghyselen@ugent.be)
Melissa Farasyn (melissa.farasyn@ugent.bemailto:melissa.farasyn@ugent.be)
Lien Hellebaut (lien.hellebaut@ugent.bemailto:lien.hellebaut@ugent.be)