We are proud to announce the release of a new version of BabelNet https://babelnet.org/ and its APIs, *both Java and the brand-new Python version*, developed jointly by the Sapienza NLP Group http://nlp.uniroma1.it of the *Sapienza University of Rome* under the supervision of prof. Roberto Navigli https://www.diag.uniroma1.it/navigli/ and Babelscape http://babelscape.com/, *a successful deep-tech multilingual NLP Company* providing innovative solutions for multilingual NLP.
BabelNet -- winner of the *prominent paper award 2017* from the Artificial Intelligence Journal and the META prize 2015, and covered in media such as The Guardian https://www.theguardian.com/news/2018/feb/23/oxford-english-dictionary-can-worlds-biggest-dictionary-survive-internet and Time magazine http://wwwusers.di.uniroma1.it/~navigli/img/Redefining_the_modern_dictionary.png -- is today’s *most far-reaching multilingual resource* which, according to need, can be used as an *encyclopedic dictionary*, or a *semantic network* or a huge *knowledge base/ontology*. It has been used by more than *1000 universities and research institutions*, enabling multilinguality in several fields of AI and NLP, such as semantic search, Word Sense Disambiguation, Semantic Role Labeling and image tagging.
BabelNet was created by means of the seamless integration and interlinking of the largest multilingual Web encyclopedia - i.e., Wikipedia - with the most popular computational lexicon of English - i.e., WordNet, and other lexical resources such as Wiktionary, OmegaWiki, Wikidata, dozens of wordnets, Wikiquote, GeoNames, and ImageNet. BabelNet provides *multilingual synsets*, i.e., concepts and named entities lexicalized in many languages, and connected with large amounts of semantic relations.
*Version 5.1* comes with the following features:
- *500 languages* and *22 million synsets* covered; - *53 resources *linked and integrated; - *Wikipedia* and *Wikidata* updated thanks to *BabelNet live*; - *Open English WordNet* has been updated to version 2021; - Added *Q-codes* identifiers (e.g. https://www.hetop.eu/hetop/3CGP/?la=en&rr=CGP_QC_QD8); - Added *string tags *from *Wikipedia labels*; - *French wordnets cleaned up* by removing most potentially incorrect translations; - *Italian wordnet definitions *cleaned up; - *General data cleanup* (glosses, senses, Named Entity vs. concept labels); - *Lemma casing corrected in 24 languages* (English, Italian, Spanish, German, French, Dutch, Polish, Portuguese, Russian, Bulgarian, Czech, Danish, Greek, Estonian, Finnish, Croatian, Hungarian, Lithuanian, Latvian, Maltese, Romanian, Slovak, Slovenian, Swedish).
More statistics are available at: babelnet.org/statistics.
Kind regards, The BabelNet team