ELRA Catalogue of Language Resources - Update - Corpora

13 Nov 2023


      [Apologies for multiple postings]
We are happy to announce that 1 new written corpus, 1 new monolingual 
lexicon and 2 new speech resources are now available in our catalogue.
Corpus for fine-grained analysis and automatic detection of irony on 
Twitter https://catalogue.elra.info/en-us/repository/browse/ELRA-W0337/
ISLRN: 478-366-550-085-8 http://www.islrn.org/resources/478-366-550-085-8
This corpus was annotated by trained annotators (Master’s students in 
Linguistics) using a detailed annotation scheme for irony 
categorization, which describes four labels: ‘ironic by means of a 
polarity contrast’, ‘situational irony’, ‘other verbal irony’ and ‘not 
ironic’. It consists of 4791 instances with an irony label and a tweet ID.
Bitext Synonym Data - General Language 
https://catalogue.elra.info/en-us/repository/browse/ELRA-L0202/
ISLRN: 470-885-612-363-1 http://www.islrn.org/resources/470-885-612-363-1
The Bitext Synonym Data - General Language includes 31,723 entries and 
more than 100,000 synonyms for English language. This dataset is a set 
of synonyms developed to augment the English version of Wordnet, a 
powerful open-source
lexical database, released in 2005. All synonyms can be linked to Bitext 
Lexical Data - English (see ELRA-L0140) for lemmatization, POS and 
morphological information.
Corpus of Spontaneous Japanese (CSJ) 
https://catalog.elra.info/en-us/repository/browse/ELRA-S0488/
ISLRN: 280-594-494-328-0 https://islrn.org/resources/280-594-494-328-0/
The "Corpus of Spontaneous Japanese" (or CSJ) contains about 650 hours 
of spontaneous speech that correspond to about 7000k words. All these 
speech materials are recorded using head-worn close-talking microphones 
and DAT, and down-sampled to 16kHz, 16bit accuracy. The speech material 
is transcribed both at orthographic and phonetic levels. In addition, 
segment label, intonation label, and other miscellaneous annotations are 
provided for a subset of CSJ, called the Core, which contains about 500k 
words or 45 hours of speech.
EWA-DB – Early Warning of Alzheimer speech database 
https://catalogue.elra.info/en-us/repository/browse/ELRA-S0489/
ISLRN: 730-022-142-264-9 http://www.islrn.org/resources/730-022-142-264-9
EWA-DB is a speech database that contains data from 3 clinical groups: 
Alzheimer's disease, Parkinson's disease, mild cognitive impairment, and 
a control group of healthy subjects. Speech samples of each clinical 
group were obtained using the EWA smartphone application, which contains 
4 different language tasks: sustained vowel phonation, diadochokinesis, 
object and action naming (30 objects and 30 actions), picture 
description (two single pictures and three complex pictures). The total 
number of speakers in the database is 1649. Of these, there are 87 
people with Alzheimer's disease, 175 people with Parkinson's disease, 62 
people with mild cognitive impairment, 2 people with a mixed diagnosis 
of Alzheimer's + Parkinson's disease and 1323 healthy controls.
For more information on the catalogue or if you would like to enquire 
about having your resources distributed by ELRA, please contact us 
mailto:contact@elda.org.
_________________________________________
Visit the ELRA Catalogue of Language Resources http://catalog.elra.info
Visit the Universal Catalogue http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates 
http://www.elra.info/en/catalogues/language-resources-announcements
--