[Apologies for multiple postings]
We are happy to announce that 1 new written corpus, 1 new monolingual lexicon and 2 new speech resources are now available in our catalogue.
Corpus for fine-grained analysis and automatic detection of irony on Twitter https://catalogue.elra.info/en-us/repository/browse/ELRA-W0337/ ISLRN: 478-366-550-085-8 http://www.islrn.org/resources/478-366-550-085-8
This corpus was annotated by trained annotators (Master’s students in Linguistics) using a detailed annotation scheme for irony categorization, which describes four labels: ‘ironic by means of a polarity contrast’, ‘situational irony’, ‘other verbal irony’ and ‘not ironic’. It consists of 4791 instances with an irony label and a tweet ID.
Bitext Synonym Data - General Language https://catalogue.elra.info/en-us/repository/browse/ELRA-L0202/ ISLRN: 470-885-612-363-1 http://www.islrn.org/resources/470-885-612-363-1
The Bitext Synonym Data - General Language includes 31,723 entries and more than 100,000 synonyms for English language. This dataset is a set of synonyms developed to augment the English version of Wordnet, a powerful open-source lexical database, released in 2005. All synonyms can be linked to Bitext Lexical Data - English (see ELRA-L0140) for lemmatization, POS and morphological information.
Corpus of Spontaneous Japanese (CSJ) https://catalog.elra.info/en-us/repository/browse/ELRA-S0488/ ISLRN: 280-594-494-328-0 https://islrn.org/resources/280-594-494-328-0/
The "Corpus of Spontaneous Japanese" (or CSJ) contains about 650 hours of spontaneous speech that correspond to about 7000k words. All these speech materials are recorded using head-worn close-talking microphones and DAT, and down-sampled to 16kHz, 16bit accuracy. The speech material is transcribed both at orthographic and phonetic levels. In addition, segment label, intonation label, and other miscellaneous annotations are provided for a subset of CSJ, called the Core, which contains about 500k words or 45 hours of speech.
EWA-DB – Early Warning of Alzheimer speech database https://catalogue.elra.info/en-us/repository/browse/ELRA-S0489/ ISLRN: 730-022-142-264-9 http://www.islrn.org/resources/730-022-142-264-9
EWA-DB is a speech database that contains data from 3 clinical groups: Alzheimer's disease, Parkinson's disease, mild cognitive impairment, and a control group of healthy subjects. Speech samples of each clinical group were obtained using the EWA smartphone application, which contains 4 different language tasks: sustained vowel phonation, diadochokinesis, object and action naming (30 objects and 30 actions), picture description (two single pictures and three complex pictures). The total number of speakers in the database is 1649. Of these, there are 87 people with Alzheimer's disease, 175 people with Parkinson's disease, 62 people with mild cognitive impairment, 2 people with a mixed diagnosis of Alzheimer's + Parkinson's disease and 1323 healthy controls.
For more information on the catalogue or if you would like to enquire about having your resources distributed by ELRA, please contact us mailto:contact@elda.org. _________________________________________
Visit the ELRA Catalogue of Language Resources http://catalog.elra.info Visit the Universal Catalogue http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates http://www.elra.info/en/catalogues/language-resources-announcements