[Apologies for multiple postings]* *
We are happy to announce that 66 new monolingual lexicons and 1 speech resource are now available in our catalogue. Moreover, 4 speech resources are now available at reduced fees.
*1) New Language Resources:*
*Bitext Lexical Datasets* http://catalog.elra.info/en-us/repository/search/?q=Bitext+Lexical+Dataset
The series of *Bitext Lexical Datasets* for the generic vocabulary includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The following 15 languages are available:
As a complement to the datasets mentioned above, 11 datasets of *Language Variants* can also be obtained:
1. Arabic (MSA) http://catalog.elra.info/en-us/repository/browse/ELRA-L0136/dataset and Arabic Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0151/dataset consisting of Arabic Gulf, Arabic Najdi, Arabic Egypt and Arabic MSA variants, 2. Chinese (Simplified) http://catalog.elra.info/en-us/repository/browse/ELRA-L0137/dataset, Chinese (Traditional) http://catalog.elra.info/en-us/repository/browse/ELRA-L0138/dataset, and Chinese Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0152/dataset (Simplified + Traditional), 3. Dutch http://catalog.elra.info/en-us/repository/browse/ELRA-L0139/dataset and Dutch Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0153/dataset consisting of Netherlands and Belgium variants, 4. English http://catalog.elra.info/en-us/repository/browse/ELRA-L0140/dataset and English Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0154/dataset consisting of United States, United Kingdom and India variants, 5. Finnish http://catalog.elra.info/en-us/repository/browse/ELRA-L0141/dataset and Finnish Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0155/dataset consisting of Standard and Colloquial Finnish variants, 6. French http://catalog.elra.info/en-us/repository/browse/ELRA-L0142/dataset and French Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0156/dataset consisting of France, Canada and Switzerland variants, 7. German http://catalog.elra.info/en-us/repository/browse/ELRA-L0143/dataset and German Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0157/dataset consisting of Germany and Switzerland variants, 8. Indonesian http://catalog.elra.info/en-us/repository/browse/ELRA-L0144/dataset, 9. Italian http://catalog.elra.info/en-us/repository/browse/ELRA-L0145/dataset and Italian Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0158/dataset consisting of Italy and Switzerland variants, 10. Malay http://catalog.elra.info/en-us/repository/browse/ELRA-L0146/dataset, 11. Norwegian (Bokmal) http://catalog.elra.info/en-us/repository/browse/ELRA-L0147/dataset and Norwegian Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0159/dataset consisting of Bokmal and Nynorsk variants, 12. Portuguese http://catalog.elra.info/en-us/repository/browse/ELRA-L0148/dataset and Portuguese Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0160/dataset consisting of Portugal and Brazil variants, 13. Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0149/dataset and Spanish Language Variants http://catalog.elra.info/en-us/repository/browse/ELRA-L0161/dataset consisting of Spain, North America, Central America, Andes and Southern Cone variants,
*Bitext Synthetic Data* http://catalog.elra.info/en-us/repository/search/?q=Bitext+Synthetic+Data
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English and Spanish languages. They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for each utterance. Data is distributed as models or open text files.
For each language, the following verticals are available:
1. Automotive: 52 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0162/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0182/) 2. Retail banking: 26 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0163/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0183/) 3. Education: 37 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0164/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0184/) 4. Event and ticketing: 25 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0165/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0185/) 5. Field Service: 27 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0166/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0186/) 6. Healthcare: 40 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0167/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0187/) 7. Hospitality: 24 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0168/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0188/) 8. Insurance: 38 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0169/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0189/) 9. Legal : 29 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0170/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0190/) 10. Manufacturing: 34 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0171/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0191/) 11. Media Streaming: 24 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0172/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0192/) 12. Mortgage and loans: 39 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0173/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0193/) 13. Moving and storage: 29 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0174/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0194/) 14. Real estate and construction: 28 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0175/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0195/) 15. Restaurant/ bar chains: 30 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0176/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0196/) 16. Retail Ecomm: 34 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0177/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0197/) 17. Telecommunication: 26 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0178/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0198/) 18. Travel: 33 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0179/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0199/) 19. Utilities: 21 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0180/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0200/) 20. Wealth management: 24 intents (English http://catalog.elra.info/en-us/repository/browse/ELRA-L0181/, Spanish http://catalog.elra.info/en-us/repository/browse/ELRA-L0201/)
*Persian Kids’ Speech Corpus* http://catalog.elra.info/en-us/repository/browse/ELRA-S0487/
The Persian Kids’ Speech Corpus consists of speech signals recorded by 286 children (141 girls, 145 boys), from 6 to 9 years old, through an Andreas Mic Anti-Noise microphone and a Premium Speechmike headphone. This recorded data was manually checked and labeled. Finally, a corpus containing 162,395 samples with a duration of 33 hours and 44 minutes was created. The samples are distributed as follows:
1. 29,057 Words (478 minutes), 2. 17,429 SubWords (260 minutes), 3. 43,838 Syllables (485 minutes), 4. 70,078 Phonemes (765 minutes), 5. 1,993 Extra Vocabulary (36 minutes).
The prepared speech corpus comprehensively contains all the 29 Persian phonemes, 118 syllables, 56 sub-words, and 711 words and is particularly applicable to speech recognition and linguistics studies.
*2) Reduced fees for the following speech resources:*
* *Chinese Mandarin (South) database* http://catalog.elra.info/en-us/repository/browse/ELRA-S0397/ * *Chinese Mandarin (North) database* http://catalog.elra.info/en-us/repository/browse/ELRA-S0398/ * *Japanese Kids Speech database (Lower Grade)* http://catalog.elra.info/en-us/repository/browse/ELRA-S0411/ * *Japanese Kids Speech database (Upper Grade)* http://catalog.elra.info/en-us/repository/browse/ELRA-S0412/**
For more information on the catalogue or if you would like to enquire about having your resources distributed by ELRA, please *contact us* mailto:contact@elda.org. _________________________________________
Visit the *ELRA Catalogue of Language Resources* http://catalog.elra.info Visit the *Universal Catalogue* http://universal.elra.info** *Archives * http://www.elra.info/en/catalogues/language-resources-announcementsof ELRA Language Resources Catalogue Updates
/Our apologies if you have received multiple copies of this announcement./