whithin the Czech AHISTO project we have OCRed about 300,000
pages from Czech medieval sources FONTES related to the Hussite
era.
The current corpus data contain more than 3 million sentences
(84 million tokens) mostly in Old Czech (36 million tokens),
German and Latin. The corpus is available for download at
https://nlp.fi.muni.cz/trac/ahisto/wiki/NerDataset#Corpus
--
Ales Horak
Natural Language Processing Centre (NLP Centre)
Faculty of Informatics
Masaryk University
Brno, Czech Republic
Alexander Osherenko via Corpora wrote on Mar 29, 2023:
> Hi,
>
> I'm looking for digital old church Slavonic resources such as corpora,
> treebanks, wordnets or raw texts. I am aware of the GORAZD: The Old Church
> Slavonic Digital Hub
http://www.gorazd.org/?q=en/node/21 or the TOROT
> treebank at
https://universaldependencies.org, but maybe I miss something.
> Thanks, Alexander
> --
> Alexander Osherenko, Dr. rer. nat.
> Research Associate
> Bavarian Academy of Sciences and Humanities
http://badw.de/
> Profile: Socioware Development
http://www.socioware.de/osherenko_page.html
> Profile: Humboldt-Universität zu Berlin
>
https://wirsindhumboldt.de/de/VKkZNyFaeu
> Profile: ResearchGate
>
https://www.researchgate.net/profile/Alexander_Osherenko
> Channel: Youtube
https://www.youtube.com/user/MrOsherenko
> _______________________________________________
> Corpora mailing list -- corpora@list.elra.info
>
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to corpora-leave@list.elra.info