We are proud to announce the release of a new version of BabelNet and its APIs, both Java and the brand-new Python version, developed jointly by the Sapienza NLP Group of the Sapienza University of Rome under the supervision of prof. Roberto Navigli and Babelscape, a successful deep-tech multilingual NLP Company providing innovative solutions for multilingual NLP.

BabelNet -- winner of the prominent paper award 2017 from the Artificial Intelligence Journal and the META prize 2015, and covered in media such as The Guardian and Time magazine -- is today’s most far-reaching multilingual resource which, according to need, can be used as an encyclopedic dictionary, or a semantic network or a huge knowledge base/ontology. It has been used by more than 1000 universities and research institutions, enabling multilinguality in several fields of AI and NLP, such as semantic search, Word Sense Disambiguation, Semantic Role Labeling and image tagging.

BabelNet was created by means of the seamless integration and interlinking of the largest multilingual Web encyclopedia - i.e., Wikipedia - with the most popular computational lexicon of English - i.e., WordNet, and other lexical resources such as Wiktionary, OmegaWiki, Wikidata, dozens of wordnets, Wikiquote, GeoNames, and ImageNet. BabelNet provides multilingual synsets, i.e., concepts and named entities lexicalized in many languages, and connected with large amounts of semantic relations.

Version 5.1 comes with the following features:

  • 500 languages and 22 million synsets covered;
  • 53 resources linked and integrated;
  • Wikipedia and Wikidata updated thanks to BabelNet live;
  • Open English WordNet has been updated to version 2021;
  • Added Q-codes identifiers (e.g. https://www.hetop.eu/hetop/3CGP/?la=en&rr=CGP_QC_QD8);
  • Added string tags from Wikipedia labels;
  • French wordnets cleaned up by removing most potentially incorrect translations;
  • Italian wordnet definitions cleaned up;
  • General data cleanup (glosses, senses, Named Entity vs. concept labels);
  • Lemma casing corrected in 24 languages (English, Italian, Spanish, German, French, Dutch, Polish, Portuguese, Russian, Bulgarian, Czech, Danish, Greek, Estonian, Finnish, Croatian, Hungarian, Lithuanian, Latvian, Maltese, Romanian, Slovak, Slovenian, Swedish).

More statistics are available at: babelnet.org/statistics.

Kind regards,
The BabelNet team

--
==============================================
Roberto Navigli - Professor
Department of Computer, Control and Management Engineering
Sapienza University of Rome
Via Ariosto, 25
00185 Roma Italy
Phone: +39 06 77274109
Sapienza NLP Group: http://nlp.uniroma1.it
Co-founder of Babelscape
==============================================