We are pleased to announce the new Version 2 of the Reference Corpus of Middle High German (ReM), which is available for download via the project website:
https://linguistics.rub.de/rem
The Reference Corpus of Middle High German (1050–1350) consists of more than two million tokens, providing a mostly complete collection of written records from Early Middle High German (1050–1200) as well as a careful selection of Middle High German texts from 1200 to 1350. The corpus was compiled in the context of a series of projects at the Universities of Cologne, Bonn, and Bochum, beginning in the mid-1980s.
This new version of the corpus contains numerous corrections and improvements, both to the tokenization and to the linguistic annotations, as well as several new documents that were added to the corpus.
In addition to CorA-XML, various new formats are available for download, including TEI XML and GraphML, which, among other things, is usable with a local ANNIS 4 instance. There is also a JSON-based format that contains all available annotations and provides easy access for data analysis scripts.
The new version of the corpus can be accessed via ANNIS 4 at the following URL:
https://newannis.linguistics.rub.de/rem
The Reference Corpus of Middle High German is licensed under the Creative Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).
-- Prof. Dr. Stefanie Dipper (she/her) - Professur fuer Computerlinguistik Sprachwiss. Institut - Ruhr-Universitaet Bochum - 44780 Bochum - Germany Email: stefanie.dipper@rub.de - https://www.linguistics.rub.de/~dipper/