We proudly announce the public availability of the DANTE lexical database, developed originally by Sue Atkins, Adam Kilgarriff and Michael Rundell in 2010 for Foras na Gaeilge, which decided to release the dataset under the CC-BY licence recently.
"DANTE – the Database of ANalysed Texts of English – is a lexical database which provides a corpus-based description of the core vocabulary of English. It records the semantic, grammatical, combinatorial, and text-type characteristics of over 42,000 single-word lemmas and 23,000 compounds and phrasal verbs, and it also includes over 27,000 idioms and phrases." (Rundell & Atkins, 2010)
The dataset is provided in a Lexonomy instance running at https://dantedictionary.com/ (including API) as well as raw data at https://github.com/lexicalcomputing/dante.
Regards,
Miloš Jakubíček Lexical Computing
References: * https://dantedictionary.com/ * https://github.com/lexicalcomputing/dante * Convery, C., Mianáin, P. O., Raghallaigh, M. O., Atkins, S., Kilgarriff, A., & Rundell, M. (2010). The DANTE Database (Database of ANalysed Texts of English) [Conference paper]. Proceedings of the XIV EURALEX International Conference