[Apologies for cross-posting]
Dear colleagues,
We are delighted to announce that the BFM2022 corpus of Old and Middle French (9th to 15th centuries) is now available from the web portal of the Base de Français Médiéval at https://txm-bfm.huma-num.fr/txm/?command=documentation&path=/BFM2022. The Base de Français Médiéval provides free access to several corpora (source texts and digital annotation) under a French public open data license (https://www.etalab.gouv.fr/licence-ouverte-open-licence). Three modes of access are supported : • search, analysis and reading tools provided by the TXM-BFM web portal; • download a binary corpus file for use with TXM local application; • download TEI XML source files from NAKALA repository: https://nakala.fr/collection/10.34847/nkl.93ee3ts1. The BFM portal is now hosted by the Huma-Num infrastructure which provides a secure connection for user data. The BFM2022 corpus includes some fifty new texts, amounting to approximately 6,450,000 words. All the texts are formatted according to the TEI guidelines (including the instances of direct speech), automatically pos-tagged and lemmatized. The POS tags have been manually verified in 8 new texts (46 total, approximately 1,000,000 words), and the lemmatization has been verified and disambiguated in 27 texts (aproximately 620,000 words). An original digital edition of Psautier d’Arundel by C. Pignatelli is one of the new texts included in the corpus. As well as BFM2022, a syntactically annotated corpus PROFITEROLE-V1-0 is now available from the BFM web portal. Produced by the ANR funded PROFITEROLE Project (https://www.lattice.cnrs.fr/projets/projets-passes/projet-anr-profiterole), it supports querying syntactic relations encoded according the Universal Dependencies guidelines (https://universaldependencies.org). We will appreciate any feedback on technical issues or errors in texts you may encounter while using the BFM. Best regards, The BFM Team bfm [at] ens-lyon [dot] fr