[Corpora-List] Re: Church Slavonic resources

29 Mar 2023


      Dear Alexander,
whithin the Czech AHISTO project we have OCRed about 300,000
pages from Czech medieval sources FONTES related to the Hussite
era.
The current corpus data contain more than 3 million sentences
(84 million tokens) mostly in Old Czech (36 million tokens),
German and Latin. The corpus is available for download at
https://nlp.fi.muni.cz/trac/ahisto/wiki/NerDataset#Corpus
kind regards,
-- 
Ales Horak
Natural Language Processing Centre (NLP Centre)
Faculty of Informatics
Masaryk University
Brno, Czech Republic


Alexander Osherenko via Corpora wrote on Mar 29, 2023:
> Hi,
> 
> I'm looking for digital old church Slavonic resources such as corpora,
> treebanks, wordnets or raw texts. I am aware of the GORAZD: The Old Church
> Slavonic Digital Hub http://www.gorazd.org/?q=en/node/21 or the TOROT
> treebank at https://universaldependencies.org, but maybe I miss something.
> Thanks, Alexander
> --
> Alexander Osherenko, Dr. rer. nat.
> Research Associate
> Bavarian Academy of Sciences and Humanities http://badw.de/
> Profile: Socioware Development http://www.socioware.de/osherenko_page.html
> Profile: Humboldt-Universität zu Berlin
> https://wirsindhumboldt.de/de/VKkZNyFaeu
> Profile: ResearchGate
> https://www.researchgate.net/profile/Alexander_Osherenko
> Channel: Youtube https://www.youtube.com/user/MrOsherenko

> _______________________________________________
> Corpora mailing list -- corpora@list.elra.info
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to corpora-leave@list.elra.info

2026

2025

2024

2023

2022

[Corpora-List] Re: Church Slavonic resources