DANTE resurrected: public release of the DANTE English lexical database - Corpora

9 Oct 2024


      We proudly announce the public availability of the DANTE lexical database,
developed originally by Sue Atkins, Adam Kilgarriff and Michael Rundell in
2010 for Foras na Gaeilge, which decided to release the dataset under the
CC-BY licence recently.
"DANTE – the Database of ANalysed Texts of English – is a lexical database
which provides a corpus-based description of the core vocabulary of
English. It records the semantic, grammatical, combinatorial, and text-type
characteristics of over 42,000 single-word lemmas and 23,000 compounds and
phrasal verbs, and it also includes over 27,000 idioms and phrases."
(Rundell & Atkins, 2010)
The dataset is provided in a Lexonomy instance running at
https://dantedictionary.com/ (including API) as well as raw data at
https://github.com/lexicalcomputing/dante.
Regards,
Miloš Jakubíček
Lexical Computing
References:
* https://dantedictionary.com/
* https://github.com/lexicalcomputing/dante
* Convery, C., Mianáin, P. O., Raghallaigh, M. O., Atkins, S., Kilgarriff,
A., & Rundell, M. (2010). The DANTE Database (Database of ANalysed Texts of
English) [Conference paper]. Proceedings of the XIV EURALEX International
Conference