[PoliticIT@EVALITA2023] Call for participation EVALITA 2023 Task - PoliticIT: Political ideology detection in Italian texts - Corpora

13 Feb 2023


      CALL FOR PARTICIPATION
EVALITA 2023 Task - PoliticIT: Political ideology detection in Italian texts
Held as part of EVALITA 2023
https://www.evalita.it/campaigns/evalita-2023/, a periodic evaluation
campaign of Natural Language Processing and speech tools for the Italian
language
September 7th-8th 2023, Parma
Codalab link: https://codalab.lisn.upsaclay.fr/competitions/8507
Dear All,
We are inviting researchers and students to participate in the
shared-task PoliticIT:
Political ideology detection in Italian texts held as part of EVALITA 2023,
the evaluation campaign of Natural Language Processing and speech tools for
the Italian language.
The goal of this task is to extract political ideology information from
Italian texts. For this, an automatic document classification task on
clusters of texts is proposed. It consists of extracting the self-assigned
gender as demographic trait, and the political ideology as a psychographic
trait from a set of texts written in Italian from several authors that
share those traits. Political ideology is considered as a binary and as a
multiclass problem. The PoliticIT shared task is based on a previous task
named PoliticES presented at IberLEF 2022 (García-Díaz et. al. 2022b) where
the dataset was an extension of the PoliCorpus 2020 dataset (García-Díaz et
al., 2022a).
The participants will be provided development, development_test, training
and test datasets in Italian.  The corpus was collected between 2020 and
2022 from the Twitter accounts of politicians in Italy using the
UMUCorpusClassifier (García-Díaz et al., 2020). We created clusters of
texts mixing some of these extracted tweets in order to prevent ethical and
privacy issues about author profiling in Twitter. Consequently, all the
clusters are composed of texts written by different users that share all
the traits under evaluation. We labelled each cluster with his or her
self-assigned gender (male, female) and political spectrum on two axes:
binary (left, right) and multiclass (left, moderate_left, moderate_right,
right). Moreover, the Twitter mentions of the politicians were anonymised
by replacing them with the token @user. In addition, other Twitter accounts
mentions were also encoded as @user. Consequently, the text traits cannot
be guessed trivially by reading politician's names and searching
information on them on the Internet. The dataset is composed of different
clusters with around 80-100 tweets.
Moreover, in order to facilitate participation in the competition, a
Google Colab notebook will be provided. In this notebook, it is shown how
to load the development dataset and how to train 3 baselines models based
on logistic regression with a simple Bag-of-Words (BoW) model for each
trait (self_assigned_gender, ideology_binary and ideology_multiclass). In
addition, it is shown how to calculate the final F1-score of each model and
how to generate the final submission file.   To download the data, the
notebook and participate, go to
https://codalab.lisn.upsaclay.fr/competitions/8507.
Best regards,
The PoliticIT 2023 organizing committee
References
-
García-Díaz, J. A., Almela, Á., Alcaraz-Mármol, G., & Valencia-García,
   R. (2020). UMUCorpusClassifier: Compilation and evaluation of linguistic
   corpus for Natural Language Processing tasks. Procesamiento del Lenguaje
   Natural, 65, 139-142.
   -
García-Díaz, J. A., Colomo-Palacios, R., & Valencia-García, R. (2022a).
   Psychographic traits identification based on political ideology: An author
   analysis study on Spanish politicians’ tweets posted in 2020. Future
   Generation Computer Systems, 130(1), 59-74.
   -
García-Díaz, J. A., Jiménez Zafra, S. M., Martín Valdivia, M. T.,
   García-Sánchez, F., Ureña López, L. A., & Valencia García, R. (2022b).
   Overview of PoliticEs 2022: Spanish Author Profiling for Political
   Ideology. Procesamiento del Lenguaje Natural, 69, 265-272.
Important dates
-
Release of development corpora: Jan 31, 2023
   -
Release of training corpora: Feb 7, 2023
   -
Release of test corpora and start of evaluation campaign: May 2, 2023
   -
End of evaluation campaign (deadline for runs submission): May 19, 2023
   -
Publication of official results: May 30, 2023
   -
Paper submission: Jun 14, 2023
   -
Review notification: Jul 10, 2023
   -
Camera ready submission: Jul 25, 2023
   -
EVALITA Workshop: Parma, Sep 7th-8th, 2023
   -
Publication of proceedings: Sep ??, 2023
Organizing committee
-
Daniel Russo (Language and Dialogue Technologies group at Fondazione
   Bruno Kessler (FBK), UniTn)
   -
Salud María Jiménez-Zafra (SINAI research group, Universidad de Jaén,
   Spain)
   -
José Antonio García-Díaz (UMUTeam research group, Universidad de Murcia,
   Spain)
   -
Tommaso Caselli (Faculty of Arts, Rijksuniveristeit Groningen)
   -
Marco Guerini (Language and Dialogue Technologies group at Fondazione
   Bruno Kessler (FBK), UniTn)
   -
L. Alfonso Ureña-López (SINAI research group, Universidad de Jaén, Spain)
   -
Rafael Valencia-García (UMUTeam research group, Universidad de Murcia,
   Spain)
[image: Universidad de Jaén] http://www.uja.es/ *Salud María Jiménez
Zafra*
sjzafra@ujaen.es
Universidad de Jaén
Grupo de Investigación SINAI http://sinai.ujaen.es/ | Departamento de
Informática
EPS Jaén, Edificio A3, Despacho 219
Campus Las Lagunillas s/n 23071 - Jaén | +34 953212992
[image: Universidad de Jaén] http://www.uja.es/