Dear colleagues,
Interpreting corpora are a type of language resource that interweaves multilingualism with multimodality, spoken with signed languages, and split-second processing with contextualised interactions. Compiling an interpreting corpus incurs significant efforts: annotating 1 hour of signs can take 320 hours (Wehrmeyer 2019), and transcribing oral features often defies automatic recognition.
As the first step towards reusing such valuable datasets, we created the core metadata schema to consistently and informatively describe an interpreting corpus. The schema is based on a review of 114 corpora (see https://unic.dipintra.it/Metadata.aspx), FAIR principles (Wilkinson et al. 2016), international standards (International Organization for Standardization 2015, 2019), similar initiatives (e.g. Paquot et al. 2023), and ontologies of the interpreting community (e.g. Pöchhacker 2022). It is available at https://tinyurl.com/intpmetadata, and example implementations using four community, conference and sign language interpreting corpora can be found at https://tinyurl.com/intpmetadata-example.
We’d like to encourage more colleagues to provide feedback on the schema by the end of July. The response at the CIUTI conference two weeks ago was heartening, and we invite you to co-create a metadata standard that fits the past, current and future needs of the interpreting community.
Thank you for your cooperation.
With best wishes, Nannan Liu and Mariachiara Russo
References International Organization for Standardization (2015). ISO 24622-1 Language resource management –– Component Metadata Infrastructure (CMDI) –– Part 1: The Component Metadata Model. International Standardization Organization. International Organization for Standardization (2019). ISO 24622-2:2019 Language resource management –– Component Metadata Infrastructure (CMDI) –– Part 2: Component metadata specification language. International Standardization Organization. Paquot, M., König, A., Stemle, E. & Frey, J.-C. (2023, January 27). Core metadata schema for learner corpora. Open Data @ UCLouvain, https://tinyurl.com/L2metadataV2. Pöchhacker, F. (2022). Introducing interpreting studies (3rd ed.). London and New York: Routledge. Wehrmeyer, E. (2019). A corpus for signed language interpreting research. Interpreting 21 (1), 62–90. Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., and others (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3 (1), 1–9.
Dr Nannan Liu Marie Curie Fellow Project FAITHhttps://cordis.europa.eu/project/id/101108651 Department of Interpreting and Translation University of Bologna