**** CFP for AIAS '25 **** *
*AI for Accelerated Research Symposium*
*October 27–28, 2025 — San Francisco, CA*
With the deadline for submissions coming up August 1, we wanted to
invite submissions
of original research, position papers, and visionary ideas that explore how
AI is reshaping the research lifecycle and accelerating scientific
discovery.
Areas of interest include, but are not limited to: machine reading and
knowledge extraction, intelligent data collection, automated hypothesis
generation, and AI- driven exploration across scientific domains. This
premier annual event brings together leading thinkers from academia,
industry, and government to examine the transformative impact of AI on
science and to foster cross-disciplinary collaboration at the frontier of
innovation.
We welcome contributions addressing both foundational advances and
real-world applications in the following areas:
- *Scientific Knowledge Extraction form Literature and
Representation: *Extraction
of Ontologies, automated reasoning systems, and AI-enhanced platforms for
organizing, linking, and accelerating research findings across domains.
- *Data collection and synthesis using AI:* Applications of generative
architectures for molecule generation, experiment simulation, or synthetic
data generation.
- *Physics-Informed AI and Scientific Machine Learning: *AI models that
embed physical laws or constraints to enhance interpretability,
generalization, and scientific fidelity.
- *Neuro-Symbolic AI: *Hybrid models combining neural networks with
symbolic reasoning to advance scientific in- ference, automation, and
logic-based discovery.
- *Large Language Models (LLMs) and Conversational AI for Science: *Use
of LLMs and agent-based systems to support literature mining, hypothesis
generation, scientific coding, and collaborative research workflows.
- *AI for Multidisciplinary Research:* Bridging disciplinary boundaries
with LLMs. Enabling clearer communication and collabo- ration across
research fields. Applied AI techniques reshaping discovery pipelines in
founda- tional sciences and engineering systems.
*Keynote Speakers:*
• Jennifer Doudna (Berkeley)
• Anima Anandkumar (Caltech)
• David Baker (University of Washington)
Interactive Sessions: Plenary lectures, panel discussions, breakout groups,
and hands-on demos. Prize AI Accelerated Research Award Ceremony: Honoring
breakthrough contributions from emerging scientists.
*Interactive Sessions:* Plenary lectures, panel discussions, breakout
groups, and hands-on demos. *AI Accelerated Research Award Ceremony:*
Honoring breakthrough contributions from emerging scientists.
*Important Dates*
• Submission Deadline: August 1, 2025
• Notification of Acceptance: August 31, 2025
• Symposium Dates: October 26–28, 2025
*Submission Details*
We welcome:
• Full research papers (6–8 pages)
• Short papers (2–4 pages)
• Extended abstracts (up to 2 pages)
• Vision or position papers (up to 4 pages)
Submissions must follow the AIAS formatting guidelines and be submitted via
the symposium website. see more at https://aias2025.org/
CFP details and submissions: https://aias2025.org/call-for-papers/
Organizing Committee:
- Jennifer Chayes, Dean of the College of Computing, Data Science, and
Society at UC Berkeley
- Yan Li, Executive Director of Scientific Programs, Chen
InstitutePietro Perona
- Allan E. Puckett Professor of Electrical Engineering and Computation
and Neural Systems, Caltech
- Mengdi Wang, Associate Professor of Electrical and Computer
Engineering and the Center for Statistics and Machine Learning, Princeton
- Parisa Kordjamshidi, Associate Professor of Computer Science and
Engineering, Michigan State University
- Hamid Karimian, Research Assistant Professor of Computer Science and
Engineering, Michigan State University
See more at https://aias2025.org/
CFP details and submissions: https://aias2025.org/call-for-papers/
ᐧ
**Call for Papers:* *
*
Slav-NLP:10thWorkshoponNLP for Slavic languages
At ACL-2025, Vienna, Austria
31 July 2025
http://bsnlp.cs.helsinki.fi <http://bsnlp.cs.helsinki.fi/>
Submission Deadline: 3 May
**
WORKSHOPDESCRIPTION
The 10th edition of the Slav-NLP Workshop — at ACL 2025. Sponsored by
SIGSLAV: ACL Special Interest Group on Slavic NLP.
Slavic languages play a crucial role due to their diverse cultural
heritage and wide use — over 400M speakers worldwide. Current political
and economic developments in Central/ Eastern Europe thrust the Slavic
languages into sharp focus, especially in light of rapid technological
advancements, and evolving consumer markets.
Research on applied **and ***theoretical*NLP in the context of Slavic
languages is still lagging. Linguistic phenomena that are common to the
Slavic languages — rich morphology, free word order, etc. — make NLP for
these languages challenging. Slav-NLP Workshops gather researchers from
academia and industry, aiming to stimulate research in Slavic NLP, and
foster the creation of tools and resources. The Workshops welcome the
exchange of ideas and experience, discussing current challenges, and
promoting the available resources. The structural similarity, as well as
the easily recognizable core vocabulary and inflectional inventory
spanning this large language group, creates a special environment where
researchers can appreciate the shared problems and communicate naturally.
We are happy *again *to organize Slav-NLP in Central Europe.
This Workshop addresses Natural Language Processing (NLP) for the Slavic
languages. NLP tasks in urgent need of attention include:
*
language modeling,
*
morphological, syntactic and semantic analysis,
*
lexical semantics,
*
named-entity recognition,
*
text normalization and processing non-standard language,
*
co-reference resolution,
*
information extraction,
*
question answering,
*
text summarization,
*
machine translation,
*
development of linguistic resources,
*
development and assessment of large language models,
*
text classification,
*
text generation,
*
disinformation detection,
*
fact verification,
*
sentiment analysis.
The Workshop continues the proud tradition established by the 9 previous
(B)SNLP Workshops.
IMPORTANT DATES
*
Submission deadline: *3 May*2025
*
Pre-reviewed ARR commitment20 May 2025
*
Notification of acceptance: *1 June*2025
*
Camera-ready papers due: 15 June 2025
*
Workshop: 31 July 2025
**
SHARED TASK
This year the Slav-NLP Workshop features — Shared Task on Detection and
Classification of Persuasion Techniques— in two types of texts: (a)
parliamentary debateson highly-contested topics, and (b) social media
postsrelated to the spread of propaganda and disinformation.
Read about the Shared Task on the Workshop’s Web page.
SUBMISSION
At the Workshop’s Web page: bsnlp.cs.helsinki.fi
<http://bsnlp.cs.helsinki.fi/call-for-papers.html>
*
*
Workshop Contact: bsnlp(a)cs.helsinki.fi
*
--
Roman Yangarber
Professor, University of Helsinki, Finland
Digital Humanities
INEQ: Helsinki Inequality Initiative
<https://helsinki.fi/en/ineq-helsinki-inequality-initiative> —
Linguistic Inequalities and Translation Technologies
------------------------------------------------------------------------
e-Learning & language learning
Language Learning Lab
Unioninkatu 40, Metsätalo A214
helsinki.fi/revita <https://www.helsinki.fi/revita>
helsinki.fi/language-learning-lab
<https://www.helsinki.fi/language-learning-lab>
mobile: +358 50 41 51 71 3
------------------------------------------------------------------------
RЯ
**First Call for Papers**
Gaze4NLP - The First International Workshop on Gaze Data and Natural
Language Processing
September 11-13, 2025, Varna, Bulgaria (co-located with RANLP 2025)
The First Workshop on Gaze Data and Natural Language Processing
(Gaze4NLP), co-located with RANLP 2025 in Varna, Bulgaria, invites
papers of a theoretical or experimental nature describing research
methodologies by employing interdisciplinary perspectives, including
computer science and engineering perspectives and cognitive sciences,
and identifying challenges to resolve in the intersection of the two
domains: eye tracking and NLP. Gaze4NLP aims to bring together
researchers conducting research on eyes on eyes on text and NLP; and
establishing bridges between them for identifying future venues of
research.
Workshop webpage:
https://gaze4nlp.github.io/Gaze4NLP2025/about.html
Important Dates
Workshop paper submission deadline: 6 July 2025
Workshop paper acceptance notification: 31 July 2025
Workshop paper camera-ready versions: 30 August 2025
Workshop camera-ready proceedings ready: 8 September 2025
Workshops: 11-13 September 2025
All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”)
Topics for the workshop will include, but are not limited to:
- Investigating the pillars for bridging the gap between the research on
eyes on text and NLP. Study how to expand research methodologies by
employing interdisciplinary perspectives, including computer science and
engineering perspectives and cognitive sciences, and identify
challenges, issues to resolve.
- Exploring new areas so that both fields benefit from each other better
than the past, identifying novel domains of exploration for further
research.
- Discussing how to develop cognitively inspired models that align human
reading data with LLMs.
Submissions
We solicit regular workshop papers, which will be included in the
proceedings as archival publications. All categories of papers may be
long (maximum 8 pages of content + up to one page for limitations
(required) + unlimited references) or short (maximum 4 pages of content
+ up to one page for limitations (required) + unlimited references).
Accepted papers will be presented in the form of either oral or poster
presentations.
Please note that camera-ready papers are allowed an additional page.
The workshop proceedings will be part of the ACL anthology. Accepted
papers will also be given an opportunity with an extended version to be
published as part of an edited book.
Submission link:
https://softconf.com/ranlp25/Gaze4NLP2025/
Organization Committee:
Dr. Cengiz Acarturk, Jagiellonian University, Poland
Dr. Jamal Nasir, University of Galway, Ireland
Dr. Burcu Can, University of Stirling, Scotland, UK
Dr. Cagri Coltekin, University of Tubingen, Germany
--
Dr. Cengiz Acarturk, Prof.UJ
Centre for Cognitive Science, Jagiellonian University, Krakow
On behalf of the Organization Committee
We are hiring a Senior Lecturer (comparable to an associate professor) at the Department of Computer Science and Engineering, a joint department at the University of Gothenburg and Chalmers University of Technology in Gothenburg, Sweden.
This is a broad call open to anyone with a background in data science and AI, but we are particularly interested in candidates with an NLP background. This is a senior faculty position but we will also consider strong junior candidates.
You can find more details on the application page:
https://web103.reachmee.com/ext/I005/1035/job?site=7&lang=UK&validator=9b89…
If you are thinking of applying and would like to discuss about the position, please contact me (richard.johansson(a)cse.gu.se) or Gerardo Schneider, head of the division (gerardo(a)chalmers.se).
The deadline for applying is August 15.
Best regards,
Richard Johansson
Dear ACL 2025 Attendees:
ACL will feature a lineup of 18 Birds of a Feather (BoF) and Affinity Group
events to bring together participants around shared research topics,
professional experiences, and community affiliations. The hosts of these
events are looking forward to welcoming you to the conference!
The full schedule with session descriptions has been released on the conference
website <https://2025.aclweb.org/program/bof/>. Session titles and times
are listed below:
Mon, Jul 28
SomosNLP: The Iberoamerican NLP Community
11:00 - 12:30, ballroom 1.31-1.32
Hosts: María Grandury, Selene Báez, Diana Galván, Helena Gómez, Danae
Sánchez
Queer in AI Meet-Up
12:30 - 14:00, ballroom 1.33
Hosts: Sabine Weber
Mentorship on NLP Research
14:00 - 15:30, ballroom 1.31-1.32
Hosts: Oana Ignat, Weijia Shi, Ziqiao Ma
Tue, Jul 29
Navigating Challenges in Building Industrial LLM Applications
10:30 - 12:00, ballroom 1.14
Hosts: Gauri Kholkar, Aakash Bist, Ratinder Ahuja
Humanists in NLP
10:30 - 12:00, ballroom 1.31-1.32
Hosts: Patrick Sui
Teaching NLP
12:00 - 13:30, ballroom 1.33
Hosts: Margot Mieskes, Laura Biester, György Kovacs
NLP x Graphs: Where Structure Meets Language
14:00 - 15:30, ballroom 1.14
Hosts: Yuqicheng Zhu, Moritz Plenz
Southeast Asian NLP Community, Projects, and Beyond
14:00 - 15:30, ballroom 1.31-1.32
Hosts: Fajri Koto, Jan Christian Blaise Cruz, Holy Lovenia, Samuel
Cahyawijaya, Alham Fikri Aji, Peerat Limkonchotiwat, M. Reza Qorib
EquiCL Welcome Session
14:00 - 15:30, ballroom 1.33
Hosts: Zeerak Talat, Christine de Kock, Fatima Elsafoury, Jackie Lo
Learning and Reasoning for Structured Data
16:00 - 17:30, ballroom 1.14
Hosts: Vivek Gupta, Dan Roth
Multilingualism: from data crawling to evaluation
16:00 - 17:30, ballroom 1.31-1.32
Hosts: Pinzhen Chen, Andrey Kutuzov, Letiția Pârcălăbescu
Participatory Design for NLP
16:00 - 17:30, ballroom 1.33
Hosts: Gavin Abercrombie, Tommaso Caselli
Bridging Human Study and LLM Agents for Social Simulation
16:00 - 17:30, online only (Underline)
Hosts: Xuan Wang
Wed, Jul 30
Activations & Embeddings: Cognitive-Neuroscience Methods for LLMs
9:00 - 10:30, ballroom 1.14
Hosts: Giovanni Franco Gabriel Marraffini
Mothering the Future — In Life and in AI: Challenges, Support, and the Path
Forward for Mothers in Computing
11:00 - 12:30, ballroom 1.31-1.32
Hosts: Narjis Asad
Language Technology for Crisis Preparedness and Response (LT4CPR)
11:00 - 12:30, ballroom 1.33
Hosts: Belu Ticona, Antonios Anastasopoulos, Will Lewis, Fei Xia.
Ethical Considerations for NLP and CL
12:30 - 14:00, ballroom 1.14
Hosts: Margot Mieskes, Karën Fort, Fanny Ducel, Clémentine Bleuze, Aurélie
Névéol
Muslims in Machine Learning (MusIML)
12:45 - 14:15, ballroom 1.31-1.32
Hosts: Ehsaneddin Asgari, Suleiman Ali Khan, Ahmed Youssef
Dear ACL 2025 Attendees:
ACL will feature a lineup of 18 Birds of a Feather (BoF) and Affinity Group
events to bring together participants around shared research topics,
professional experiences, and community affiliations. The hosts of these
events are looking forward to welcoming you to the conference!
The full schedule with session descriptions has been released on the conference
website <https://2025.aclweb.org/program/bof/>. Session titles and times
are listed below:
Mon, Jul 28
SomosNLP: The Iberoamerican NLP Community
11:00 - 12:30, ballroom 1.31-1.32
Hosts: María Grandury, Selene Báez, Diana Galván, Helena Gómez, Danae
Sánchez
Queer in AI Meet-Up
12:30 - 14:00, ballroom 1.33
Hosts: Sabine Weber
Mentorship on NLP Research
14:00 - 15:30, ballroom 1.31-1.32
Hosts: Oana Ignat, Weijia Shi, Ziqiao Ma
Tue, Jul 29
Navigating Challenges in Building Industrial LLM Applications
10:30 - 12:00, ballroom 1.14
Hosts: Gauri Kholkar, Aakash Bist, Ratinder Ahuja
Humanists in NLP
10:30 - 12:00, ballroom 1.31-1.32
Hosts: Patrick Sui
Teaching NLP
12:00 - 13:30, ballroom 1.33
Hosts: Margot Mieskes, Laura Biester, György Kovacs
NLP x Graphs: Where Structure Meets Language
14:00 - 15:30, ballroom 1.14
Hosts: Yuqicheng Zhu, Moritz Plenz
Southeast Asian NLP Community, Projects, and Beyond
14:00 - 15:30, ballroom 1.31-1.32
Hosts: Fajri Koto, Jan Christian Blaise Cruz, Holy Lovenia, Samuel
Cahyawijaya, Alham Fikri Aji, Peerat Limkonchotiwat, M. Reza Qorib
EquiCL Welcome Session
14:00 - 15:30, ballroom 1.33
Hosts: Zeerak Talat, Christine de Kock, Fatima Elsafoury, Jackie Lo
Learning and Reasoning for Structured Data
16:00 - 17:30, ballroom 1.14
Hosts: Vivek Gupta, Dan Roth
Multilingualism: from data crawling to evaluation
16:00 - 17:30, ballroom 1.31-1.32
Hosts: Pinzhen Chen, Andrey Kutuzov, Letiția Pârcălăbescu
Participatory Design for NLP
16:00 - 17:30, ballroom 1.33
Hosts: Gavin Abercrombie, Tommaso Caselli
Bridging Human Study and LLM Agents for Social Simulation
16:00 - 17:30, online only (Underline)
Hosts: Xuan Wang
Wed, Jul 30
Activations & Embeddings: Cognitive-Neuroscience Methods for LLMs
9:00 - 10:30, ballroom 1.14
Hosts: Giovanni Franco Gabriel Marraffini
Mothering the Future — In Life and in AI: Challenges, Support, and the Path
Forward for Mothers in Computing
11:00 - 12:30, ballroom 1.31-1.32
Hosts: Narjis Asad
Language Technology for Crisis Preparedness and Response (LT4CPR)
11:00 - 12:30, ballroom 1.33
Hosts: Belu Ticona, Antonios Anastasopoulos, Will Lewis, Fei Xia.
Ethical Considerations for NLP and CL
12:30 - 14:00, ballroom 1.14
Hosts: Margot Mieskes, Karën Fort, Fanny Ducel, Clémentine Bleuze, Aurélie
Névéol
Muslims in Machine Learning (MusIML)
12:45 - 14:15, ballroom 1.31-1.32
Hosts: Ehsaneddin Asgari, Suleiman Ali Khan, Ahmed Youssef
Bonn Talks on Recent Trends in Applied Linguistics
*Using mixed methods to analyze stance: A variationist approach *
Dr. Katharina Pabst, Radboud University Nijmegen
Friday, July 18, 2.15 pm - 3.45 pm CEST
Sign up here:
https://uni-bonn.zoom-x.de/meeting/register/7zWSRP69R8SZWZptF6wfMA
In this talk, I will introduce a framework for coding speaker stance
(i.e., the way individuals position themselves towards an interaction)
that I developed with colleagues from the University of Toronto. Our
framework, which combines insights from variationist sociolinguistics
and pragmatics, is based on pragmatic tests that offer a replicable way
of capturing an interactional phenomenon such as stance quantitatively.
Drawing on two case studies of complementizer (that) – i.e., the
variation between overt that and zero in sentences such as I think
(that) linguistic variation is fun – I discuss challenges and
opportunities of using this framework, as well as its implications for
the study of language and social meaning.
Prof. Dr. Robert Fuchs | Head of Department and Professor of English
Linguistics | Department of English, American and Celtic Studies |
University of Bonn | Rabinstr. 8 53113 Bonn, Germany |
https://uni-bonn.academia.edu/RFuchs |
https://www.iaak.uni-bonn.de/bael/en/people/chair/prof-dr-robert-fuchs |
https://sites.google.com/view/rflinguistics/
*Recent publications:*
Coats, S., Basile, A., Morin, C. & Fuchs, R. (to appear). *The YouTube
Corpus of Singapore English Podcasts*. /English World-Wide/
Fuchs, R. et al. (to appear). *Non-standard morphosyntactic variation in
L2 English varieties world-wide: A corpus-based study
<https://www.sciencedirect.com/science/article/pii/S0024384125000737>*.
/Lingua/.
Fuchs, R., Wiltshire, C. & Sarmah, P. (to appear). *The role of English
in the linguistic ecology of Northeast India
<https://www.academia.edu/125365118/The_role_of_English_in_the_linguistic_ec…>*.
In P. Siemund, et al. (Eds.), /World Englishes in their Local
Multilingual Ecologies/. Amsterdam: Benjamins.
Lange, C., & Fuchs, R. (to appear). *English in India*. In R. Hickey &
K. Burridge (Eds.), /New Cambridge History of the English Language/.
Cambridge: CUP.
Fuchs, R. (2025). *Influencing people around the globe - The linguistic
expression of persuasion across varieties of English worldwide*
<https://www.academia.edu/107491904/Influencing_people_around_the_globe_The_…>.
In D. Dayter, & S. Rüdiger (Eds.), /Manipulation, Influence, and
Deception: The Changing Landscape of Persuasive Language/, 135-156.
Cambridge: CUP.
Dear Colleagues,
I would like to draw your attention to a fully funded PhD position in NLP. The position is for three years, starting on October 1, 2025, or per agreement.
Details on the position and the application procedure can be found in the job ad here: https://jobs.uzh.ch/job-vacancies/phd-position-empowering-ai-to-explore-the….
The position is part of Project AI-R that aims to bring together cutting-edge methods in NLP with recent developments in the philosophy of language and logic.
To apply, please follow the guidelines in the ad.
All best,
Reto
Dear all,
We would like to invite you to a free webinar Corpus Linguistics: Skills for the Future from our Lancaster webinar series.
In this webinar, we will focus on two domains that have used corpus methods to develop and improve their practice. Prof Elena Semino will talk about the use of corpus methods in healthcare communication and Dr Dana Gablasova will look at the role played by corpus methods in development and evaluation of GenAI tools for language learning and teaching.
⏲️ Time: 22 July 2025, 2-3pm UK time
🔗 Link for free registration: https://forms.office.com/e/uppRBrE5AF
Best,
Vaclav
Professor Vaclav Brezina
Professor in Corpus Linguistics
Co-Director of ESRC Centre for Corpus Approaches to Social Science
Lancaster University
Lancaster, LA1 4YD
Office: County South, room C05
T: +44 (0)1524 510828
@vaclavbrezina
[cid:image001.jpg@01DBF65D.4028AAC0]<http://www.lancaster.ac.uk/arts-and-social-sciences/about-us/people/vaclav-…>
Dear colleagues,
I am happy to announce the availability of the new book,
Automatic Question Generation
https://link.springer.com/book/10.1007/978-3-031-92072-1
Published by Springer,
in the series Synthesis Lectures on Human Language Technologies.
Many thanks to Graeme Hirst, the series editor!
The book describes a variety of approaches,
including generating questions from syntactic analyses, semantic resources, neural architectures, ontologies and knowledge graphs, and large language models.
Also covers evaluation and some fundamentals of questions.
Hopefully, the book might be useful for NLP/AI researchers, students, educators, test-developers, and anyone interested in this topic.
Michael Flor
Senior Research Scientist
ETS Research Institute
Educational Testing Service
Princeton, NJ, USA
mflor(a)ets.org
________________________________
This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.
Thank you for your compliance.
________________________________
In this newsletter:
Fall 2025 LDC data scholarship program
New publications:
AnnoDIFP Session Audio and Transcripts<https://catalog.ldc.upenn.edu/LDC2025S06>
Penn Parsed Corpora of Historical English Second Release<https://catalog.ldc.upenn.edu/LDC2025T09>
LoReHLT Uzbek Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2025T08>
________________________________
Fall 2025 LDC data scholarship program
Student applications for the Fall 2025 LDC data scholarship program are being accepted now through September 15, 2025. This program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor. For application requirements and program rules, visit the LDC Data Scholarships page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.
________________________________
New publications:
AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts<https://catalog.ldc.upenn.edu/LDC2025S06> was developed by LDC, the Florida Institute of Technology <https://www.fit.edu/> (FIT), and the University of New Haven<https://www.newhaven.edu/index.php> (UNH) to support algorithm development for predicting personality traits. It contains 438.34 hours of English audio and transcripts from in-person interviews of 366 participants paired with scores from two self-reported personality assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3).
In-person interviews were recorded at LDC, FIT, and UNH. In each session, the participant and interviewer were in separate sound-isolated rooms with communication between them supplied by audio/video hardware. Sessions consisted of the following tasks: rapport building, a YouTube task, a map task, and a business task. Further details on collection methodology and session tasks are contained in the documentation accompanying this release.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
Penn Parsed Corpora of Historical English Second Release<https://catalog.ldc.upenn.edu/LDC2025T09> was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the period of the First World War (1914 CE). This second release corrects errors and inconsistencies in Penn Parsed Corpora of Historical English (LDC2020T16<https://catalog.ldc.upenn.edu/LDC2020T16>), further streamlines annotation, simplifies the directory structure, and includes updated documentation.
This data set contains three corpora covering traditionally recognized periods of English:
* The Penn-Helsinki Parsed Corpus of Middle English, second edition
* The Penn-Helsinki Parsed Corpus of Early Modern English
* The Penn Parsed Corpus of Modern British English, second edition
The texts are in two forms: part-of-speech tagged text and syntactically annotated text. Annotations were manually reviewed for accuracy and consistency. Included in this release are updated annotation guidelines, philological information for each corpus, and the CorpusSearch 2 program, which allows users to search the data for words, word sequences, and syntactic structure.
2025 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
*
LoReHLT Uzbek Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2025T08> was developed by LDC and is comprised of approximately 47 million words of Uzbek monolingual text, 563,000 words of found Uzbek-English parallel text, 100,000 Uzbek words translated from English data, and 6.4 hours of Uzbek broadcast news and amateur web audio recordings. Approximately 151, 000 words were annotated for named entities and over 28,000 words were annotated for full entity including nominals and pronouns. Noun-phrase chunking was applied to more than 13,000 words. Over 20,890 words were labeled with simple semantic annotation. Topic annotation was applied to the audio recordings. Data was collected from discussion forum, news, reference, social network, broadcast news, web audio recordings, and weblogs.
LoReHLT was a companion project of the DARPA LORELEI program. The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104