- Corpora - ELRA lists

CFP Deadline August 1: AIAS '25 - Symposium for AI Accelerated Science
by Sol Rosenberg 17 Jul '25

17 Jul '25

**** CFP for AIAS '25 **** * *AI for Accelerated Research Symposium* *October 27–28, 2025 — San Francisco, CA* With the deadline for submissions coming up August 1, we wanted to invite submissions of original research, position papers, and visionary ideas that explore how AI is reshaping the research lifecycle and accelerating scientific discovery. Areas of interest include, but are not limited to: machine reading and knowledge extraction, intelligent data collection, automated hypothesis generation, and AI- driven exploration across scientific domains. This premier annual event brings together leading thinkers from academia, industry, and government to examine the transformative impact of AI on science and to foster cross-disciplinary collaboration at the frontier of innovation. We welcome contributions addressing both foundational advances and real-world applications in the following areas: - *Scientific Knowledge Extraction form Literature and Representation: *Extraction of Ontologies, automated reasoning systems, and AI-enhanced platforms for organizing, linking, and accelerating research findings across domains. - *Data collection and synthesis using AI:* Applications of generative architectures for molecule generation, experiment simulation, or synthetic data generation. - *Physics-Informed AI and Scientific Machine Learning: *AI models that embed physical laws or constraints to enhance interpretability, generalization, and scientific fidelity. - *Neuro-Symbolic AI: *Hybrid models combining neural networks with symbolic reasoning to advance scientific in- ference, automation, and logic-based discovery. - *Large Language Models (LLMs) and Conversational AI for Science: *Use of LLMs and agent-based systems to support literature mining, hypothesis generation, scientific coding, and collaborative research workflows. - *AI for Multidisciplinary Research:* Bridging disciplinary boundaries with LLMs. Enabling clearer communication and collabo- ration across research fields. Applied AI techniques reshaping discovery pipelines in founda- tional sciences and engineering systems. *Keynote Speakers:* • Jennifer Doudna (Berkeley) • Anima Anandkumar (Caltech) • David Baker (University of Washington) Interactive Sessions: Plenary lectures, panel discussions, breakout groups, and hands-on demos. Prize AI Accelerated Research Award Ceremony: Honoring breakthrough contributions from emerging scientists. *Interactive Sessions:* Plenary lectures, panel discussions, breakout groups, and hands-on demos. *AI Accelerated Research Award Ceremony:* Honoring breakthrough contributions from emerging scientists. *Important Dates* • Submission Deadline: August 1, 2025 • Notification of Acceptance: August 31, 2025 • Symposium Dates: October 26–28, 2025 *Submission Details* We welcome: • Full research papers (6–8 pages) • Short papers (2–4 pages) • Extended abstracts (up to 2 pages) • Vision or position papers (up to 4 pages) Submissions must follow the AIAS formatting guidelines and be submitted via the symposium website. see more at https://aias2025.org/ CFP details and submissions: https://aias2025.org/call-for-papers/ Organizing Committee: - Jennifer Chayes, Dean of the College of Computing, Data Science, and Society at UC Berkeley - Yan Li, Executive Director of Scientific Programs, Chen InstitutePietro Perona - Allan E. Puckett Professor of Electrical Engineering and Computation and Neural Systems, Caltech - Mengdi Wang, Associate Professor of Electrical and Computer Engineering and the Center for Statistics and Machine Learning, Princeton - Parisa Kordjamshidi, Associate Professor of Computer Science and Engineering, Michigan State University - Hamid Karimian, Research Assistant Professor of Computer Science and Engineering, Michigan State University See more at https://aias2025.org/ CFP details and submissions: https://aias2025.org/call-for-papers/ ᐧ

1 0

CORRECTED DATES: Call for Papers: Slav-NLP: 10th Workshop on NLP forSlavic Languages
by Roman Yangarber 17 Jul '25

17 Jul '25

**Call for Papers:* * * Slav-NLP:10thWorkshoponNLP for Slavic languages At ACL-2025, Vienna, Austria 31 July 2025 http://bsnlp.cs.helsinki.fi <http://bsnlp.cs.helsinki.fi/> Submission Deadline: 3 May ** WORKSHOPDESCRIPTION The 10th edition of the Slav-NLP Workshop — at ACL 2025. Sponsored by SIGSLAV: ACL Special Interest Group on Slavic NLP. Slavic languages play a crucial role due to their diverse cultural heritage and wide use — over 400M speakers worldwide. Current political and economic developments in Central/ Eastern Europe thrust the Slavic languages into sharp focus, especially in light of rapid technological advancements, and evolving consumer markets. Research on applied **and ***theoretical*NLP in the context of Slavic languages is still lagging. Linguistic phenomena that are common to the Slavic languages — rich morphology, free word order, etc. — make NLP for these languages challenging. Slav-NLP Workshops gather researchers from academia and industry, aiming to stimulate research in Slavic NLP, and foster the creation of tools and resources. The Workshops welcome the exchange of ideas and experience, discussing current challenges, and promoting the available resources. The structural similarity, as well as the easily recognizable core vocabulary and inflectional inventory spanning this large language group, creates a special environment where researchers can appreciate the shared problems and communicate naturally. We are happy *again *to organize Slav-NLP in Central Europe. This Workshop addresses Natural Language Processing (NLP) for the Slavic languages. NLP tasks in urgent need of attention include: * language modeling, * morphological, syntactic and semantic analysis, * lexical semantics, * named-entity recognition, * text normalization and processing non-standard language, * co-reference resolution, * information extraction, * question answering, * text summarization, * machine translation, * development of linguistic resources, * development and assessment of large language models, * text classification, * text generation, * disinformation detection, * fact verification, * sentiment analysis. The Workshop continues the proud tradition established by the 9 previous (B)SNLP Workshops. IMPORTANT DATES * Submission deadline: *3 May*2025 * Pre-reviewed ARR commitment20 May 2025 * Notification of acceptance: *1 June*2025 * Camera-ready papers due: 15 June 2025 * Workshop: 31 July 2025 ** SHARED TASK This year the Slav-NLP Workshop features — Shared Task on Detection and Classification of Persuasion Techniques— in two types of texts: (a) parliamentary debateson highly-contested topics, and (b) social media postsrelated to the spread of propaganda and disinformation. Read about the Shared Task on the Workshop’s Web page. SUBMISSION At the Workshop’s Web page: bsnlp.cs.helsinki.fi <http://bsnlp.cs.helsinki.fi/call-for-papers.html> * * Workshop Contact: bsnlp(a)cs.helsinki.fi * -- Roman Yangarber Professor, University of Helsinki, Finland Digital Humanities INEQ: Helsinki Inequality Initiative <https://helsinki.fi/en/ineq-helsinki-inequality-initiative> — Linguistic Inequalities and Translation Technologies ------------------------------------------------------------------------ e-Learning & language learning Language Learning Lab Unioninkatu 40, Metsätalo A214 helsinki.fi/revita <https://www.helsinki.fi/revita> helsinki.fi/language-learning-lab <https://www.helsinki.fi/language-learning-lab> mobile: +358 50 41 51 71 3 ------------------------------------------------------------------------ RЯ

1 1

Gaze4NLP - The First International Workshop on Gaze Data and Natural Language Processing
by Cengiz Acarturk 17 Jul '25

17 Jul '25

**First Call for Papers** Gaze4NLP - The First International Workshop on Gaze Data and Natural Language Processing September 11-13, 2025, Varna, Bulgaria (co-located with RANLP 2025) The First Workshop on Gaze Data and Natural Language Processing (Gaze4NLP), co-located with RANLP 2025 in Varna, Bulgaria, invites papers of a theoretical or experimental nature describing research methodologies by employing interdisciplinary perspectives, including computer science and engineering perspectives and cognitive sciences, and identifying challenges to resolve in the intersection of the two domains: eye tracking and NLP. Gaze4NLP aims to bring together researchers conducting research on eyes on eyes on text and NLP; and establishing bridges between them for identifying future venues of research. Workshop webpage: https://gaze4nlp.github.io/Gaze4NLP2025/about.html Important Dates Workshop paper submission deadline: 6 July 2025 Workshop paper acceptance notification: 31 July 2025 Workshop paper camera-ready versions: 30 August 2025 Workshop camera-ready proceedings ready: 8 September 2025 Workshops: 11-13 September 2025 All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”) Topics for the workshop will include, but are not limited to: - Investigating the pillars for bridging the gap between the research on eyes on text and NLP. Study how to expand research methodologies by employing interdisciplinary perspectives, including computer science and engineering perspectives and cognitive sciences, and identify challenges, issues to resolve. - Exploring new areas so that both fields benefit from each other better than the past, identifying novel domains of exploration for further research. - Discussing how to develop cognitively inspired models that align human reading data with LLMs. Submissions We solicit regular workshop papers, which will be included in the proceedings as archival publications. All categories of papers may be long (maximum 8 pages of content + up to one page for limitations (required) + unlimited references) or short (maximum 4 pages of content + up to one page for limitations (required) + unlimited references). Accepted papers will be presented in the form of either oral or poster presentations. Please note that camera-ready papers are allowed an additional page. The workshop proceedings will be part of the ACL anthology. Accepted papers will also be given an opportunity with an extended version to be published as part of an edited book. Submission link: https://softconf.com/ranlp25/Gaze4NLP2025/ Organization Committee: Dr. Cengiz Acarturk, Jagiellonian University, Poland Dr. Jamal Nasir, University of Galway, Ireland Dr. Burcu Can, University of Stirling, Scotland, UK Dr. Cagri Coltekin, University of Tubingen, Germany -- Dr. Cengiz Acarturk, Prof.UJ Centre for Cognitive Science, Jagiellonian University, Krakow On behalf of the Organization Committee

1 1

Senior lecturer position at the University of Gothenburg / Chalmers University of Technology
by Richard Johansson 17 Jul '25

17 Jul '25

We are hiring a Senior Lecturer (comparable to an associate professor) at the Department of Computer Science and Engineering, a joint department at the University of Gothenburg and Chalmers University of Technology in Gothenburg, Sweden. This is a broad call open to anyone with a background in data science and AI, but we are particularly interested in candidates with an NLP background. This is a senior faculty position but we will also consider strong junior candidates. You can find more details on the application page: https://web103.reachmee.com/ext/I005/1035/job?site=7&lang=UK&validator=9b89… If you are thinking of applying and would like to discuss about the position, please contact me (richard.johansson(a)cse.gu.se) or Gerardo Schneider, head of the division (gerardo(a)chalmers.se). The deadline for applying is August 15. Best regards, Richard Johansson

1 0

ACL 2025 Conference - Birds of a Feather (BoF) and Affinity Group events
by Horacio Saggion 17 Jul '25

17 Jul '25

Dear ACL 2025 Attendees: ACL will feature a lineup of 18 Birds of a Feather (BoF) and Affinity Group events to bring together participants around shared research topics, professional experiences, and community affiliations. The hosts of these events are looking forward to welcoming you to the conference! The full schedule with session descriptions has been released on the conference website <https://2025.aclweb.org/program/bof/>. Session titles and times are listed below: Mon, Jul 28 SomosNLP: The Iberoamerican NLP Community 11:00 - 12:30, ballroom 1.31-1.32 Hosts: María Grandury, Selene Báez, Diana Galván, Helena Gómez, Danae Sánchez Queer in AI Meet-Up 12:30 - 14:00, ballroom 1.33 Hosts: Sabine Weber Mentorship on NLP Research 14:00 - 15:30, ballroom 1.31-1.32 Hosts: Oana Ignat, Weijia Shi, Ziqiao Ma Tue, Jul 29 Navigating Challenges in Building Industrial LLM Applications 10:30 - 12:00, ballroom 1.14 Hosts: Gauri Kholkar, Aakash Bist, Ratinder Ahuja Humanists in NLP 10:30 - 12:00, ballroom 1.31-1.32 Hosts: Patrick Sui Teaching NLP 12:00 - 13:30, ballroom 1.33 Hosts: Margot Mieskes, Laura Biester, György Kovacs NLP x Graphs: Where Structure Meets Language 14:00 - 15:30, ballroom 1.14 Hosts: Yuqicheng Zhu, Moritz Plenz Southeast Asian NLP Community, Projects, and Beyond 14:00 - 15:30, ballroom 1.31-1.32 Hosts: Fajri Koto, Jan Christian Blaise Cruz, Holy Lovenia, Samuel Cahyawijaya, Alham Fikri Aji, Peerat Limkonchotiwat, M. Reza Qorib EquiCL Welcome Session 14:00 - 15:30, ballroom 1.33 Hosts: Zeerak Talat, Christine de Kock, Fatima Elsafoury, Jackie Lo Learning and Reasoning for Structured Data 16:00 - 17:30, ballroom 1.14 Hosts: Vivek Gupta, Dan Roth Multilingualism: from data crawling to evaluation 16:00 - 17:30, ballroom 1.31-1.32 Hosts: Pinzhen Chen, Andrey Kutuzov, Letiția Pârcălăbescu Participatory Design for NLP 16:00 - 17:30, ballroom 1.33 Hosts: Gavin Abercrombie, Tommaso Caselli Bridging Human Study and LLM Agents for Social Simulation 16:00 - 17:30, online only (Underline) Hosts: Xuan Wang Wed, Jul 30 Activations & Embeddings: Cognitive-Neuroscience Methods for LLMs 9:00 - 10:30, ballroom 1.14 Hosts: Giovanni Franco Gabriel Marraffini Mothering the Future — In Life and in AI: Challenges, Support, and the Path Forward for Mothers in Computing 11:00 - 12:30, ballroom 1.31-1.32 Hosts: Narjis Asad Language Technology for Crisis Preparedness and Response (LT4CPR) 11:00 - 12:30, ballroom 1.33 Hosts: Belu Ticona, Antonios Anastasopoulos, Will Lewis, Fei Xia. Ethical Considerations for NLP and CL 12:30 - 14:00, ballroom 1.14 Hosts: Margot Mieskes, Karën Fort, Fanny Ducel, Clémentine Bleuze, Aurélie Névéol Muslims in Machine Learning (MusIML) 12:45 - 14:15, ballroom 1.31-1.32 Hosts: Ehsaneddin Asgari, Suleiman Ali Khan, Ahmed Youssef Dear ACL 2025 Attendees: ACL will feature a lineup of 18 Birds of a Feather (BoF) and Affinity Group events to bring together participants around shared research topics, professional experiences, and community affiliations. The hosts of these events are looking forward to welcoming you to the conference! The full schedule with session descriptions has been released on the conference website <https://2025.aclweb.org/program/bof/>. Session titles and times are listed below: Mon, Jul 28 SomosNLP: The Iberoamerican NLP Community 11:00 - 12:30, ballroom 1.31-1.32 Hosts: María Grandury, Selene Báez, Diana Galván, Helena Gómez, Danae Sánchez Queer in AI Meet-Up 12:30 - 14:00, ballroom 1.33 Hosts: Sabine Weber Mentorship on NLP Research 14:00 - 15:30, ballroom 1.31-1.32 Hosts: Oana Ignat, Weijia Shi, Ziqiao Ma Tue, Jul 29 Navigating Challenges in Building Industrial LLM Applications 10:30 - 12:00, ballroom 1.14 Hosts: Gauri Kholkar, Aakash Bist, Ratinder Ahuja Humanists in NLP 10:30 - 12:00, ballroom 1.31-1.32 Hosts: Patrick Sui Teaching NLP 12:00 - 13:30, ballroom 1.33 Hosts: Margot Mieskes, Laura Biester, György Kovacs NLP x Graphs: Where Structure Meets Language 14:00 - 15:30, ballroom 1.14 Hosts: Yuqicheng Zhu, Moritz Plenz Southeast Asian NLP Community, Projects, and Beyond 14:00 - 15:30, ballroom 1.31-1.32 Hosts: Fajri Koto, Jan Christian Blaise Cruz, Holy Lovenia, Samuel Cahyawijaya, Alham Fikri Aji, Peerat Limkonchotiwat, M. Reza Qorib EquiCL Welcome Session 14:00 - 15:30, ballroom 1.33 Hosts: Zeerak Talat, Christine de Kock, Fatima Elsafoury, Jackie Lo Learning and Reasoning for Structured Data 16:00 - 17:30, ballroom 1.14 Hosts: Vivek Gupta, Dan Roth Multilingualism: from data crawling to evaluation 16:00 - 17:30, ballroom 1.31-1.32 Hosts: Pinzhen Chen, Andrey Kutuzov, Letiția Pârcălăbescu Participatory Design for NLP 16:00 - 17:30, ballroom 1.33 Hosts: Gavin Abercrombie, Tommaso Caselli Bridging Human Study and LLM Agents for Social Simulation 16:00 - 17:30, online only (Underline) Hosts: Xuan Wang Wed, Jul 30 Activations & Embeddings: Cognitive-Neuroscience Methods for LLMs 9:00 - 10:30, ballroom 1.14 Hosts: Giovanni Franco Gabriel Marraffini Mothering the Future — In Life and in AI: Challenges, Support, and the Path Forward for Mothers in Computing 11:00 - 12:30, ballroom 1.31-1.32 Hosts: Narjis Asad Language Technology for Crisis Preparedness and Response (LT4CPR) 11:00 - 12:30, ballroom 1.33 Hosts: Belu Ticona, Antonios Anastasopoulos, Will Lewis, Fei Xia. Ethical Considerations for NLP and CL 12:30 - 14:00, ballroom 1.14 Hosts: Margot Mieskes, Karën Fort, Fanny Ducel, Clémentine Bleuze, Aurélie Névéol Muslims in Machine Learning (MusIML) 12:45 - 14:15, ballroom 1.31-1.32 Hosts: Ehsaneddin Asgari, Suleiman Ali Khan, Ahmed Youssef

1 0

Public webinar: Using mixed methods to analyze stance: A variationist approach
by Robert Fuchs 16 Jul '25

16 Jul '25

Bonn Talks on Recent Trends in Applied Linguistics *Using mixed methods to analyze stance: A variationist approach * Dr. Katharina Pabst, Radboud University Nijmegen Friday, July 18, 2.15 pm - 3.45 pm CEST Sign up here: https://uni-bonn.zoom-x.de/meeting/register/7zWSRP69R8SZWZptF6wfMA In this talk, I will introduce a framework for coding speaker stance (i.e., the way individuals position themselves towards an interaction) that I developed with colleagues from the University of Toronto. Our framework, which combines insights from variationist sociolinguistics and pragmatics, is based on pragmatic tests that offer a replicable way of capturing an interactional phenomenon such as stance quantitatively. Drawing on two case studies of complementizer (that) – i.e., the variation between overt that and zero in sentences such as I think (that) linguistic variation is fun – I discuss challenges and opportunities of using this framework, as well as its implications for the study of language and social meaning. Prof. Dr. Robert Fuchs | Head of Department and Professor of English Linguistics | Department of English, American and Celtic Studies | University of Bonn | Rabinstr. 8 53113 Bonn, Germany | https://uni-bonn.academia.edu/RFuchs | https://www.iaak.uni-bonn.de/bael/en/people/chair/prof-dr-robert-fuchs | https://sites.google.com/view/rflinguistics/ *Recent publications:* Coats, S., Basile, A., Morin, C. & Fuchs, R. (to appear). *The YouTube Corpus of Singapore English Podcasts*. /English World-Wide/ Fuchs, R. et al. (to appear). *Non-standard morphosyntactic variation in L2 English varieties world-wide: A corpus-based study <https://www.sciencedirect.com/science/article/pii/S0024384125000737>*. /Lingua/. Fuchs, R., Wiltshire, C. & Sarmah, P. (to appear). *The role of English in the linguistic ecology of Northeast India <https://www.academia.edu/125365118/The_role_of_English_in_the_linguistic_ec…>*. In P. Siemund, et al. (Eds.), /World Englishes in their Local Multilingual Ecologies/. Amsterdam: Benjamins. Lange, C., & Fuchs, R. (to appear). *English in India*. In R. Hickey & K. Burridge (Eds.), /New Cambridge History of the English Language/. Cambridge: CUP. Fuchs, R. (2025). *Influencing people around the globe - The linguistic expression of persuasion across varieties of English worldwide* <https://www.academia.edu/107491904/Influencing_people_around_the_globe_The_…>. In D. Dayter, & S. Rüdiger (Eds.), /Manipulation, Influence, and Deception: The Changing Landscape of Persuasive Language/, 135-156. Cambridge: CUP.

1 0

Job: PhD in NLP in Zurich, Application Deadline on August 10, 2025
by reto.gubelmann＠uzh.ch 16 Jul '25

16 Jul '25

Dear Colleagues, I would like to draw your attention to a fully funded PhD position in NLP. The position is for three years, starting on October 1, 2025, or per agreement. Details on the position and the application procedure can be found in the job ad here: https://jobs.uzh.ch/job-vacancies/phd-position-empowering-ai-to-explore-the…. The position is part of Project AI-R that aims to bring together cutting-edge methods in NLP with recent developments in the philosophy of language and logic. To apply, please follow the guidelines in the ad. All best, Reto

1 0

Free webinar: 22 July 2-3pm UK time
by Brezina, Vaclav 16 Jul '25

16 Jul '25

Dear all, We would like to invite you to a free webinar Corpus Linguistics: Skills for the Future from our Lancaster webinar series. In this webinar, we will focus on two domains that have used corpus methods to develop and improve their practice. Prof Elena Semino will talk about the use of corpus methods in healthcare communication and Dr Dana Gablasova will look at the role played by corpus methods in development and evaluation of GenAI tools for language learning and teaching. ⏲️ Time: 22 July 2025, 2-3pm UK time 🔗 Link for free registration: https://forms.office.com/e/uppRBrE5AF Best, Vaclav Professor Vaclav Brezina Professor in Corpus Linguistics Co-Director of ESRC Centre for Corpus Approaches to Social Science Lancaster University Lancaster, LA1 4YD Office: County South, room C05 T: +44 (0)1524 510828 @vaclavbrezina [cid:image001.jpg@01DBF65D.4028AAC0]<http://www.lancaster.ac.uk/arts-and-social-sciences/about-us/people/vaclav-…>

1 0

New book - Automatic Question Generation
by Flor, Michael 15 Jul '25

15 Jul '25

Dear colleagues, I am happy to announce the availability of the new book, Automatic Question Generation https://link.springer.com/book/10.1007/978-3-031-92072-1 Published by Springer, in the series Synthesis Lectures on Human Language Technologies. Many thanks to Graeme Hirst, the series editor! The book describes a variety of approaches, including generating questions from syntactic analyses, semantic resources, neural architectures, ontologies and knowledge graphs, and large language models. Also covers evaluation and some fundamentals of questions. Hopefully, the book might be useful for NLP/AI researchers, students, educators, test-developers, and anyone interested in this topic. Michael Flor Senior Research Scientist ETS Research Institute Educational Testing Service Princeton, NJ, USA mflor(a)ets.org ________________________________ This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited. Thank you for your compliance. ________________________________

1 0

July 2025 Newsletter - LDC
by Penn LDC 15 Jul '25

15 Jul '25

In this newsletter: Fall 2025 LDC data scholarship program New publications: AnnoDIFP Session Audio and Transcripts<https://catalog.ldc.upenn.edu/LDC2025S06> Penn Parsed Corpora of Historical English Second Release<https://catalog.ldc.upenn.edu/LDC2025T09> LoReHLT Uzbek Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2025T08> ________________________________ Fall 2025 LDC data scholarship program Student applications for the Fall 2025 LDC data scholarship program are being accepted now through September 15, 2025. This program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor. For application requirements and program rules, visit the LDC Data Scholarships page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>. ________________________________ New publications: AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts<https://catalog.ldc.upenn.edu/LDC2025S06> was developed by LDC, the Florida Institute of Technology <https://www.fit.edu/> (FIT), and the University of New Haven<https://www.newhaven.edu/index.php> (UNH) to support algorithm development for predicting personality traits. It contains 438.34 hours of English audio and transcripts from in-person interviews of 366 participants paired with scores from two self-reported personality assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3). In-person interviews were recorded at LDC, FIT, and UNH. In each session, the participant and interviewer were in separate sound-isolated rooms with communication between them supplied by audio/video hardware. Sessions consisted of the following tasks: rapport building, a YouTube task, a map task, and a business task. Further details on collection methodology and session tasks are contained in the documentation accompanying this release. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * Penn Parsed Corpora of Historical English Second Release<https://catalog.ldc.upenn.edu/LDC2025T09> was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the First World War (1914 CE). This second release corrects errors and inconsistencies in Penn Parsed Corpora of Historical English (LDC2020T16<https://catalog.ldc.upenn.edu/LDC2020T16>), further streamlines annotation, simplifies the directory structure, and includes updated documentation. This data set contains three corpora covering traditionally recognized periods of English: * The Penn-Helsinki Parsed Corpus of Middle English, second edition * The Penn-Helsinki Parsed Corpus of Early Modern English * The Penn Parsed Corpus of Modern British English, second edition The texts are in two forms: part-of-speech tagged text and syntactically annotated text. Annotations were manually reviewed for accuracy and consistency. Included in this release are updated annotation guidelines, philological information for each corpus, and the CorpusSearch 2 program, which allows users to search the data for words, word sequences, and syntactic structure. 2025 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee. * LoReHLT Uzbek Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2025T08> was developed by LDC and is comprised of approximately 47 million words of Uzbek monolingual text, 563,000 words of found Uzbek-English parallel text, 100,000 Uzbek words translated from English data, and 6.4 hours of Uzbek broadcast news and amateur web audio recordings. Approximately 151, 000 words were annotated for named entities and over 28,000 words were annotated for full entity including nominals and pronouns. Noun-phrase chunking was applied to more than 13,000 words. Over 20,890 words were labeled with simple semantic annotation. Topic annotation was applied to the audio recordings. Data was collected from discussion forum, news, reference, social network, broadcast news, web audio recordings, and weblogs. LoReHLT was a companion project of the DARPA LORELEI program. The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

2025

2024

2023

2022

Corpora