CALL FOR PARTICIPATION
EVALITA 2023 Task - PoliticIT: Political ideology detection in Italian texts
Held as part of EVALITA 2023 https://www.evalita.it/campaigns/evalita-2023/, a periodic evaluation campaign of Natural Language Processing and speech tools for the Italian language
September 7th-8th 2023, Parma
Codalab link: https://codalab.lisn.upsaclay.fr/competitions/8507
Dear All,
We are inviting researchers and students to participate in the shared-task PoliticIT: Political ideology detection in Italian texts held as part of EVALITA 2023, the evaluation campaign of Natural Language Processing and speech tools for the Italian language.
The goal of this task is to extract political ideology information from Italian texts. For this, an automatic document classification task on clusters of texts is proposed. It consists of extracting the self-assigned gender as demographic trait, and the political ideology as a psychographic trait from a set of texts written in Italian from several authors that share those traits. Political ideology is considered as a binary and as a multiclass problem. The PoliticIT shared task is based on a previous task named PoliticES presented at IberLEF 2022 (García-Díaz et. al. 2022b) where the dataset was an extension of the PoliCorpus 2020 dataset (García-Díaz et al., 2022a).
The participants will be provided development, development_test, training and test datasets in Italian. The corpus was collected between 2020 and 2022 from the Twitter accounts of politicians in Italy using the UMUCorpusClassifier (García-Díaz et al., 2020). We created clusters of texts mixing some of these extracted tweets in order to prevent ethical and privacy issues about author profiling in Twitter. Consequently, all the clusters are composed of texts written by different users that share all the traits under evaluation. We labelled each cluster with his or her self-assigned gender (male, female) and political spectrum on two axes: binary (left, right) and multiclass (left, moderate_left, moderate_right, right). Moreover, the Twitter mentions of the politicians were anonymised by replacing them with the token @user. In addition, other Twitter accounts mentions were also encoded as @user. Consequently, the text traits cannot be guessed trivially by reading politician's names and searching information on them on the Internet. The dataset is composed of different clusters with around 80-100 tweets.
Moreover, in order to facilitate participation in the competition, a Google Colab notebook will be provided. In this notebook, it is shown how to load the development dataset and how to train 3 baselines models based on logistic regression with a simple Bag-of-Words (BoW) model for each trait (self_assigned_gender, ideology_binary and ideology_multiclass). In addition, it is shown how to calculate the final F1-score of each model and how to generate the final submission file. To download the data, the notebook and participate, go to https://codalab.lisn.upsaclay.fr/competitions/8507.
Best regards,
The PoliticIT 2023 organizing committee
References
-
García-Díaz, J. A., Almela, Á., Alcaraz-Mármol, G., & Valencia-García, R. (2020). UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks. Procesamiento del Lenguaje Natural, 65, 139-142. -
García-Díaz, J. A., Colomo-Palacios, R., & Valencia-García, R. (2022a). Psychographic traits identification based on political ideology: An author analysis study on Spanish politicians’ tweets posted in 2020. Future Generation Computer Systems, 130(1), 59-74. -
García-Díaz, J. A., Jiménez Zafra, S. M., Martín Valdivia, M. T., García-Sánchez, F., Ureña López, L. A., & Valencia García, R. (2022b). Overview of PoliticEs 2022: Spanish Author Profiling for Political Ideology. Procesamiento del Lenguaje Natural, 69, 265-272.
Important dates
-
Release of development corpora: Jan 31, 2023 -
Release of training corpora: Feb 7, 2023 -
Release of test corpora and start of evaluation campaign: May 2, 2023 -
End of evaluation campaign (deadline for runs submission): May 19, 2023 -
Publication of official results: May 30, 2023 -
Paper submission: Jun 14, 2023 -
Review notification: Jul 10, 2023 -
Camera ready submission: Jul 25, 2023 -
EVALITA Workshop: Parma, Sep 7th-8th, 2023 -
Publication of proceedings: Sep ??, 2023
Organizing committee
-
Daniel Russo (Language and Dialogue Technologies group at Fondazione Bruno Kessler (FBK), UniTn) -
Salud María Jiménez-Zafra (SINAI research group, Universidad de Jaén, Spain) -
José Antonio García-Díaz (UMUTeam research group, Universidad de Murcia, Spain) -
Tommaso Caselli (Faculty of Arts, Rijksuniveristeit Groningen) -
Marco Guerini (Language and Dialogue Technologies group at Fondazione Bruno Kessler (FBK), UniTn) -
L. Alfonso Ureña-López (SINAI research group, Universidad de Jaén, Spain) -
Rafael Valencia-García (UMUTeam research group, Universidad de Murcia, Spain)
[image: Universidad de Jaén] http://www.uja.es/ *Salud María Jiménez Zafra* sjzafra@ujaen.es
Universidad de Jaén Grupo de Investigación SINAI http://sinai.ujaen.es/ | Departamento de Informática EPS Jaén, Edificio A3, Despacho 219 Campus Las Lagunillas s/n 23071 - Jaén | +34 953212992
[image: Universidad de Jaén] http://www.uja.es/