Call for Shared Task Participation: Multilingual Protest Event Detection, CASE @ EMNLP 2022 - Corpora

1 Jul 2022


      Apologies for cross posting.
FIRST CALL FOR PARTICIPATION
CASE-2022 Shared Task: Multilingual Protest Event Detection
================================================
We invite you to participate in the CASE-2022 Shared Task 1: Multilingual
Protest Event Detection. The task is being held as part of the 5th Workshop
on Challenges and Applications of Automated Extraction of Socio-political
Events from Text (CASE 2022). This is a continuation of the shared task
CASE 2021 Hürriyetoğlu et al. (2021) [1]. The training set is the same as
CASE 2021. But the evaluation phase will include data from additional
languages in CASE 2022. Please see the workshop website for further
details: https://emw.ku.edu.tr/case-2022/ & Contact address:
ali.hurriyetoglu@gmail.com
Important Dates
================================================
Training data available: please follow instructions on the repository of
the task: https://github.com/emerging-welfare/case-2022-multilingual-event.
You will obtain the test data for CASE 2021 as well, with a Codalab page (
https://competitions.codalab.org/competitions/31639) available to obtain a
score for your predictions.
New test data available: September 15, 2022
Test end: September 25, 2022
System Description Paper submissions due: October 2, 2022
Notification to authors after review: Oct 09, 2022
Camera-ready: Oct 16, 2022
Workshop period @ EMNLP: Dec 7-8, 2022
Motivation
================================================
Event extraction has recently attracted a lot of attention in the NLP
community, as well as among political and social scientists: It has emerged
as a robust technology for identifying the most important information
inside media streams. At the same time, it provides a basis for
quantitative assessment of the political situation in the World. Event
extraction has long been a challenge for the natural language processing
(NLP) community as it requires sophisticated methods for detection and
classification of events: machine learning, syntactic and semantic parsing,
event ontologies, event co-reference resolution, acquisition of language
resources, grammar learning, terminology learning, temporal and spatial
reasoning, and other algorithmic approaches (Pustojevsky et al. 2003;
Boroş, 2018; Chen et al. 2021). Social and political scientists have been
working to create socio-political event (SPE) databases such as ACLED,
EMBERS, GDELT, ICEWS, MMAD, PHOENIX, POLDEM, SPEED, TERRIER, and UCDP
following similar steps for decades. These projects and the new ones
increasingly rely on machine learning (ML), deep learning (DL), and NLP
methods to deal better with the vast amount and variety of data in this
domain (Hürriyetoğlu et al. 2020). Automation offers scholars not only the
opportunity to improve existing practices, but also to vastly expand the
scope of data that can be collected and studied, thus potentially opening
up new research frontiers within the field of SPEs, such as
politically-motivated violence and social movements. Automated approaches,
however, suffer from major issues like bias, generalizability, class
imbalance, training data limitations, and ethical issues that have the
potential to affect the results and their use drastically (Lau and Baldwin
2020; Bhatia et al. 2020; Chang et al. 2019).
SPEs are varied and nuanced. Both the political context and the local
language used may affect whether and how they are reported. Therefore, all
steps of information collection (event definition, language resources, and
manual or algorithmic steps) may need to be constantly updated, leading to
a series of challenging questions: Do events related to minority groups are
represented well? Are new types of events covered? Are the event
definitions and their operationalization comparable across systems? This
workshop aims at finding answers to these questions as well. Inspiring
innovative technological and scientific solutions for tackling these issues
and quantifying the quality of the results.
Task Overview
================================================
The task consists of four subtasks relevant to Event Causality
Identification:
Subtask 1: Document classification ⇒ Does a news article contain
information about a past or ongoing event?
Subtask 2: Sentence classification ⇒ Does a sentence contain information
about a past or ongoing event?
Subtask 3: Event sentence coreference identification ⇒ Which event
sentences (subtask 2) are about the same event?
Subtask 4: Event extraction ⇒ What is the event trigger and what are its
arguments?
Participants may design mono- or multilingual solutions that work on a
single, multiple, or all subtasks concurrently. Participants are also
allowed to combine annotations for either task. Additional datasets can be
utilized for training or validation purposes.
The systems developed for one or more of these subtasks will be invited to
process a news archive to measure the correlation between automatically and
manually created event datasets. This task will be referred as Task 2.
Please see Guigni et al. (2021) for the similar task we performed last year.
You can find the task repository at
https://github.com/emerging-welfare/case-2022-multilingual-event, which
contains sample data and scripts.
Data
================================================
Training data: The training data we use for the Task 1 is the training data
for CASE-2021 and consists of English, Portuguese, and Spanish news
articles. Please find the detailed description of the data on Hürriyetoğlu
et al. (2021), https://aclanthology.org/2021.case-1.11.pdf.
Test data: There will be two test sets for Subtask 1. These are i) test
data from CASE 2021, which is already available, and ii) new test data for
CASE 2022 including new data both in existing and in new languages, e.g.
Japanese, Urdu, Mandarin, and Turkish. The test data for subtasks 2, 3, and
4 will be the same as CASE 2021 test data for subtasks 2, 3, and 4
respectively.
Evaluation
================================================
The F1-macro score on predictions for test data in each language will be
calculated separately for Subtasks 1 and 2. The subtask 2 will be evaluated
using F1-macro. Subtask 3 will be evaluated using scorch - a python
implementation of CoNLL-2012 average score for the test data (
https://github.com/LoicGrobol/scorch). Finally, we will use CoNLL-03
evaluation script (https://github.com/sighsmile/conlleval) for subtask 4.
The new test data for subtask 1 may be utilized to improve performance on
these subtasks.
The evaluation will be managed on a Codalab page. Participants will submit
their scores and the highest performing submission will be used for ranking
teams. There will be a limit on the number of submissions that can be
performed. After the test deadline, an additional Codalab page will be set
for additional scoring.
Participation
================================================
Please send your team name and the participation form that is on
https://github.com/emerging-welfare/case-2022-multilingual-event/blob/main/C...
to ali.hurriyetoglu@gmail.com. We will share the CASE-2021 data with you
right away and notify you when the CASE-2022 evaluation data is ready.
Publication
================================================
All participating teams will have the opportunity to submit their system
description papers to be considered for publication in the workshop
proceedings published by ACL Anthology. The papers should be submitted on
http://softconf.com/emnlp2022/case2022.
Organization
================================================
Ali Hürriyetoğlu, KNAW Humanities Cluster, DHLab, the Netherlands
Erdem Yörük, Koc University, Turkey
Hristo Tanev, European Commission, Joint Research Centre (EU JRC), Italy
Osman Mutlu, Koc University, Turkey
Vanni Zavarella, Italy
Reyyan Yeniterzi, Sabanci University, Turkey
Fatih Beyhan, Sabanci University, Turkey
Francielle Vargas, University of São Paulo, Brazil
Fırat Duruşan, Koc University, Turkey
Yaoyao Dai, UNC Charlotte, United States
Aaqib Javid, Koc University, Turkey
Benjamin Radford, UNC Charlotte, United States
Kalliopi Zervanou, Leiden University, the Netherlands
Milena Slavcheva, Bulgarian Academy of Sciences, Bulgaria
Niklas Stoehr, ETH Zurich, Switzerland
Guillem Ramirez, ETH Zurich, Switzerland
Shaina Raza, Public Health Ontario and University of Toronto, Canada
Farhana Ferdousi Liza (University of East Anglia, United Kingdom
Tadashi Nomoto, National Institute of Japanese Literature, Japan
Alaeddin Selçuk Gürel, Huawei, Turkey
YiJyun Lin, University of Arizona, U.S.A & National Taiwan University,
Taiwan
Tiancheng Hu, ETH Zürich, Switzerland
Onur Uca, Mersin University, Turkey
Fiona Anting Tan, Institute of Data Science, National University of
Singapore, Singapore
Hansi Hettiarachchi, Birmingham City University, United Kingdom
References
================================================
[1] Giorgi, S., Zavarella, V., Tanev, H., Stefanovitch, N., Hwang, S.,
Hettiarachchi, H., ... & Hurriyetoglu, A. (2021, January). Discovering
black lives matter events in the United States: Shared task 3, In
Proceedings of the 4th Workshop on Challenges and Applications of Automated
Extraction of Socio-political Events from Text (CASE 2021) (pp. 218-227).
ASSOC COMPUTATIONAL LINGUISTICS-ACL. URL:
https://aclanthology.org/2021.case-1.27/
[2] Hürriyetoğlu, A., Mutlu, O., Yörük, E., Liza, F. F., Kumar, R., &
Ratan, S. (2021, August). Multilingual Protest News Detection - Shared Task
1, CASE 2021. In Proceedings of the 4th Workshop on Challenges and
Applications of Automated Extraction of Socio-political Events from Text
(CASE 2021) (pp. 79-91). URL: https://aclanthology.org/2021.case-1.11/