In this newsletter: LDC data and commercial technology development
New publications: RATS Low Speech Densityhttps://catalog.ldc.upenn.edu/LDC2024S03 BabyEars Affective Vocalizationshttps://catalog.ldc.upenn.edu/LDC2024S04
________________________________ LDC data and commercial technology development For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensinghttps://www.ldc.upenn.edu/data-management/using/licensing page for further information.
________________________________ New publications: RATS Low Speech Densityhttps://catalog.ldc.upenn.edu/LDC2024S03 was developed by LDC and is comprised of 87 hours of English, Levantine Arabic, Farsi, Pashto, and Urdu speech, and non-speech samples. The recordings were assembled by concatenating a randomized selection of speech, communications systems sounds, and silence. This corpus was created to measure false alarm performance in RATS speech activity detection systems.
The source audio was extracted from RATS development and progress sets and consists of conversational telephone speech recordings collected by LDC. Non-speech samples were selected from communications systems sounds, including telephone network special information tones, radio selective calling signals, HF/VHF/UHF digital mode radio traffic, radio network control channel signals, two-way radio traffic containing roger beeps, and short duration shift-key modulated handset data transmissions.
The goal of the RATS (Robust Automatic Transcription of Speech) program was to develop human language technology systems capable of performing speech detection, language identification, speaker identification, and keyword spotting on the severely degraded audio signals that are typical of various radio communication channels, especially those employing various types of handheld portable transceiver systems.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
BabyEars Affective Vocalizationshttps://catalog.ldc.upenn.edu/LDC2024S04 contains 22 minutes of spontaneous English speech by 12 adults interacting with their infant children, for a total of 509 infant-directed utterances and 185 adult-directed or neutral utterances. Speech data was collected in a quiet room during a one-hour session where each sparent was asked to play and otherwise interact normally with their infant (aged 10-18 months). A trained research assistant then extracted discrete utterances and classified them in three categories: approval, attention, and prohibition.
2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC accounthttps://catalog.ldc.upenn.edu/login and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc@ldc.upenn.edumailto:ldc@ldc.upenn.edu M: 3600 Market St. Suite 810 Philadelphia, PA 19104