New release: the FR-R-MIGR-TWIT and the UK-R-MIGR-RA-TWIT Corpora. - Corpora

13 Dec 2022


      The *MIGR-TWIT Corpus* is a bilingual diachronic corpus of tweets created
with the aim to study the evolution of public discourse on migration in
Europe in the past 10 years.
We are pleased to announce the release of the *first two components* of the
corpus: *the **FR-R-MIGR-TWIT-2011-2022 *and the*
UK-R-MIGR-RA-TWIT-2012-2022 Corpora.*
·      *FR-R-MIGR-TWIT-2011-2022 Corpus* includes all the tweets displaying
at least one occurrence of the lexical root -*migr- *(*i.e*., the
words *immigration(s),
migrant(s), immigré(s)*), posted by *16* *right and far-right French
politicians and political parties, between 2011 and 2022,* for a total
amount of 11,761 tweets and 358,491 words.
·      *UK-R-MIGR-RA-TWIT-2012-2022 Corpus *includes all the tweets
displaying at least one occurrence of the words derived from the Latin
lexical root “*migr*” of *migrare (to move from one place to another) *in
addition to the keywords “*refugee*(*s*)” and “*asylum*”, posted by *12 **right
and far-right British politicians and political parties between 2012 and
2022*, for a total amount of 6,472 tweets and 174,707 words.
The whole corpus contains 18,233 tweets and 533,198 words.
The posts were automatically retrieved using the *Twitter API v2 Academic
Research*.
The whole corpus contains two CSV Zip files (tab-delimited format)
corresponding to each sub-corpus. The complete corpus is presented in two
versions:
-        version1 with the tweet identifier (*data__id*) and the text of
the tweet (*data__text*) as a header (folders named
*FR-R-MIGR-TWIT-2011-2022_textonly* and
*UK-R-MIGR-RA-TWIT-2012-2022_textonly*, respectively composed of 12 and 11
Zip files of every single year);
-        version2 with all tweet fields information included as a header,
such as the posting date (*data__created__at*), the username (*author__name*),
and the number of retweets (*data__public_metrics__retweet_count*),
etc., with two folders named *FR-R-MIGR-TWIT-2011-2022_meta* and
*UK-R-MIGR-RA-TWIT-2012-2022_meta*
The corpus was created by Elena Battaglia (Università della Svizzera
Italiana and Université de Lille), Guido Blandino (University of
Wolverhampton), Paola Pietrandrea and Sangwan Jeon (Université de Lille),
with the collaboration of Adelina Stojan (Université de Lille), within the
framework of the observatory OLiNDiNUM, *Observatoire LINguistique du
DIscours NUMérique* https://olindinum.huma-num.fr/, [Linguistic
Observatory of Digital Discourse], coordinated by Paola Pietrandrea.
The creation of the corpus was funded by Université de Lille, Projet
d'Internationalisation 2021 - Université Franco-italienne / Università
Italo Francese - Campus France (Hubert Curien Partnerships): Italie - PHC
Galilée 2018-19, Pays-Bas - PHC Van Gogh 2018-19.
The corpus is freely accessible through the platforms Ortolang
https://www.ortolang.fr/market/corpora/migr-twit-corpus/v1 and Zenodo
https://zenodo.org/record/7347479#.Y5ee5naZMuE.
Elena Battaglia, Guido Blandino, Sangwan Jeon, Paola Pietrandrea
Le *Corpus MIGR-TWIT* est un corpus diachronique de tweets bilingues,
établi dans l’objectif d’étudier l’évolution du discours public sur
l’immigration en Europe au cours de ces 10 dernières années.
Nous avons le plaisir de vous annoncer la publication des *deux premières
composantes* du corpus : les *corpus FR-R-MIGR-TWIT-2011-2022* et
*UK-R-MIGR-RA-TWIT-2012-2022*.
·      Le *corpus FR-R-MIGR-TWIT-2011-2022* rassemble tous les tweets
contenant au moins une occurrence du lexique dérivé de la racine lexicale -
*migr*- (*i.e*. *immigration(s), migrant(s), immigré(s)*), qui ont été
postés par *16 figures et partis politiques de la droite et de
l’extrême-droite françaises entre 2011 et 2022*, comptant un total de
11,761 tweets et 358,491 mots.
·      Le *corpus UK-R-MIGR-RA-TWIT-2012-2022* rassemble tous les tweets
contenant au moins une occurrence du lexique dérivé de la racine latine “
*migr*” de *migrare* (*s’en aller d’un lieu*) en plus des mots-clés “
*refugee(s)*” et “*asylum*” (*asile*), qui ont été postés par *12 figures,
partis et institutions politiques de la droite et de l’extrême-droite
britanniques entre 2012 et 2022*, comptant un total de 6,472 tweets et
174,707 mots.
L’ensemble du corpus compte au total 18,233 tweets et 533,198 mots.
Les données ont été automatiquement récupérées à l’aide du *Twitter API v2
Academic Research*.
Le corpus complet contient deux fichiers CSV (format tabulaire de données)
correspondant à chaque sous-corpus. Le corpus complet se présente en deux
versions :
-        version1 avec l’identifiant du tweet (*data__id*) et le texte du
tweet (*data__text*) comme l’entête (les fichiers nommés
*FR-R-MIGR-TWIT-2011-2022_textonly* et
*UK-R-MIGR-RA-TWIT-2012-2022_textonly*, respectivement composés de 12 et 11
fichiers CSV de chaque année) ;
-        version2 avec toutes les métadonnées du tweet comme l’entête,
telles que la date de publication (*data__created__at*), le nom
d’utilisateur (*author__name*), et le nombre de retweets (
*data__public_metrics__retweet_count*), etc., avec deux fichiers
nommées *FR-R-MIGR-TWIT-2011-2022_meta
*et *UK-R-MIGR-RA-TWIT-2012-2022_meta*
Le corpus a été créé par Elena Battaglia (Università della Svizzera
Italiana et Université de Lille), Guido Blandino (University of
Wolverhampton), Paola Pietrandrea et Sangwan Jeon (Université de Lille),
avec la collaboration d’Adelina Stojan (Université de Lille), dans le cadre
du projet *OLiNDiNUM, Observatoire LINguistique du DIscours NUMérique*
https://olindinum.huma-num.fr/, coordonné par Paola Pietrandrea.
La création du corpus a été financée par l’Université de Lille, Projet
d’Internationalisation 2021 - l’Université Franco-italienne / Università
Italo Francese - Campus France (Partenariats Hubert Curien) : Italie - PHC
Galilée 2018-19, Pays-Bas - PHC Van Gogh 2018-19.
Le corpus est librement accessible via les plateformes Ortolang
https://www.ortolang.fr/market/corpora/migr-twit-corpus/v1 et Zenodo
https://zenodo.org/record/7347479#.Y5ee5naZMuE.
Elena Battaglia, Guido Blandino, Sangwan Jeon, Paola Pietrandrea