Dear all,
this is the cfp for the first Workshop on Creating Interoperable Corpora of Historical Newspapers (PressMint-LREC2026) on 16 May 2026. see details below. Apologies for cross-posting!
------------
First Workshop on Creating Interoperable Corpora of Historical Newspapers (PressMint-LREC2026)
Call for Papers
Date: 16 May 2026, half-day workshop Location: Palma de Mallorca, Spain Submission Deadline: 1 March 2025 Submission link: https://softconf.com/lrec2026/PressMint/ Workshop Website: https://www.clarin.eu/PressMint-LREC2026
________________________________ Workshop description
Historical newspapers are of interest to historians and historical linguists, as well as to social and political scientists, ethnologists, anthropologists, media and communication scholars, and researchers in cultural studies. All of these are fields where contemporary digital resources, tools and methods (e.g. “distant reading”) are still underutilised. On the other hand, corpora of historical newspapers already exist for a number of languages and countries to a large extent, as they are out of copyright. Also, the images, and often OCR, are available through the national libraries. Also, in recent years these data started to be of big interest to the researchers since they preserve the historical, cultural, political, societal past. However, these corpora are not interoperable, which precludes methods for their comparison, as well as any translingual and transnational research, an especially important consideration, as statehood and nationhood are highly dynamic in Europe in the period to be covered by the project corpora. An initial joint attempt towards the creation of a corpus of historical newspapers from the beginning of 20. century on, is the CLARIN flagship project PressMinthttps://www.clarin.eu/pressmint. The project features data from 20 partners at the moment, aiming to develop a standard for interoperable resources of newspapers in diachronic timespans. The final goal is to provide structured and high quality multilingual data in a common format, with the same type of linguistic annotation that covers (at least partially) the same time period.
The workshop is supported by CLARIN
and the PressMint project.
Objective
The PressMint workshop aims to gather experts interested in creating, processing and analyzing interoperable corpora of historical data in general, but especially with a focus on newspapers. Another very important objective is to consider also the perspective of the communities who use historical data - their purposes, requirements, feedback.
We encourage the interested colleagues to present their work on both types of levels – national and pan-European; monolingual and multilingual as well as task-specific and multidisciplinary. We view this workshop as a venue to exchange research ideas and start collaboration on this topic.
The workshop will feature one invited speaker: Maud Ehrmann, EPFL, CH
We invite unpublished original work focusing on (but not exclusive to) on the following topics:
* compilation, annotation, visualisation and utilisation of historical newspaper corpora of the period relevant to PressMint (ideally around the start of the 20th century but not constrained by this period) * harmonisation of the existing multilingual historical newspaper corpora that contain either synchronic or diachronic data, or both * linking or comparing historical newspaper corpora with other datasets, including sources of structured knowledge, such as formal ontologies and LOD datasets * enrichment of historical newspaper corpora (with e.g. sentiment annotation, etc.) * machine translation of historical newspaper corpora * employment of LLMs as stand alone tools or as parts of architectures for historical data processing, maintenance and knowledge deployment. * various scenarios of usage of historical data
________________________________ Submission & Publication
We accept submission of long papers (from 6 to 8 pages), short papers (4 pages) and demo papers (4 pages) to be presented as a long or short oral presentation or poster presentations at the workshop. To support double-blind reviewing, all submissions must be fully anonymized and should be formatted according to the stylesheet available on the LREC 2026 websitehttps://lrec2026.info/authors-kit/. The papers of the workshop will be published in online proceedings.
At the time of submission, authors are also offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map [https://lremap.elra.info/], which provides metadata for the resources.
Please note that the LREC style guide should be followed. The formatting guidelines can be found here: https://lrec2026.info/authors-kit/.
Important Dates
* Paper submission deadline: 1 March 2026 * Notification of acceptance: 15 March 2026 * Camera-ready paper: 30 March 2026 * Workshop date: TBA
________________________________ Organizing Committee
* Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences, PL * Tanja Wissik, Austrian Academy of Sciences, AT * Petya Osenova, Sofia University ”St. Kl. Ohridski” & Bulgarian Academy of Sciences, BG
To contact the organisers, please email maciej.ogrodniczuk@gmail.commailto:maciej.ogrodniczuk@gmail.com.
Programme Committee (in alphabetical order)
* Tomaž Erjavec, Jožef Stefan Institute, SI * Maria Gavriilidou, Institute for Language and Speech Processing, Athena Research Center, GR * Normunds Grūzītis, University of Latvia, LV * Matyáš Kopp, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics. Charles University, CZ * Taja Kuzman, Jožef Stefan Institute, SI * Nikola Ljubešic, Jožef Stefan Institute, SI ́ * Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences, PL * Petya Osenova, Sofia University "St. Kl. Ohridski" and IICT-BAS, BG * Adam Pawłowski, University of Wrocław, PL * Stelios Piperidis, Athena Research Centre, GR * German Rigau, HiTZ Basque Research Center for Language Technology, EHU, ES * Claudia Resch, Austrian Academy of Sciences, AT * Inguna Skadiņa, Institute of Mathematics and Computer Science, University of Latvia, LV * Steinþór Steingrímsson, The Árni Magnússon Institute for Icelandic Studies, IS * Tanja Wissik, Austrian Academy of Sciences, AT
Dr. Tanja Wissik ACDH- Austrian Centre for Digital Humanities Austrian Academy of Sciences Bäckerstraße 13, A-1010 Vienna E-mail: Tanja.Wissik@oeaw.ac.at Tel: + 43 1 51581 - 2206 http://www.oeaw.ac.at/acdh/
CLARIN National Coordinator for Austria https://www.clarin.eu/governance/national-coordinators-forum
Editor of the Journal of the Text Encoding Initiative
https://journals.openedition.org/jtei/https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjournals.openedition.org%2Fjtei%2F&data=05%7C01%7Cplarkin%40EBSCO.COM%7C85b527d5a90340939c0b08daff981d18%7C50fa36ca7dd344f19e3f1bf39a3963a5%7C0%7C0%7C638103326059981514%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Zq8qC4KaE8I%2Foyl6qn6VBqOyXEcUge7c0N8afOjzJzc%3D&reserved=0