Apologies for cross posting.
FIRST CALL FOR PARTICIPATION
CASE-2022 Shared Task: Multilingual Protest Event Detection
================================================
We invite you to participate in the CASE-2022 Shared Task 1: Multilingual Protest Event Detection. The task is being held as part of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2022). This is a continuation of the shared task CASE 2021 Hürriyetoğlu et al. (2021) [1]. The training set is the same as CASE 2021. But the evaluation phase will include data from additional languages in CASE 2022. Please see the workshop website for further details: https://emw.ku.edu.tr/case-2022/ & Contact address: ali.hurriyetoglu@gmail.com
Important Dates
================================================
Training data available: please follow instructions on the repository of the task: https://github.com/emerging-welfare/case-2022-multilingual-event. You will obtain the test data for CASE 2021 as well, with a Codalab page ( https://competitions.codalab.org/competitions/31639) available to obtain a score for your predictions.
New test data available: September 15, 2022
Test end: September 25, 2022
System Description Paper submissions due: October 2, 2022
Notification to authors after review: Oct 09, 2022
Camera-ready: Oct 16, 2022
Workshop period @ EMNLP: Dec 7-8, 2022
Motivation
================================================
Event extraction has recently attracted a lot of attention in the NLP community, as well as among political and social scientists: It has emerged as a robust technology for identifying the most important information inside media streams. At the same time, it provides a basis for quantitative assessment of the political situation in the World. Event extraction has long been a challenge for the natural language processing (NLP) community as it requires sophisticated methods for detection and classification of events: machine learning, syntactic and semantic parsing, event ontologies, event co-reference resolution, acquisition of language resources, grammar learning, terminology learning, temporal and spatial reasoning, and other algorithmic approaches (Pustojevsky et al. 2003; Boroş, 2018; Chen et al. 2021). Social and political scientists have been working to create socio-political event (SPE) databases such as ACLED, EMBERS, GDELT, ICEWS, MMAD, PHOENIX, POLDEM, SPEED, TERRIER, and UCDP following similar steps for decades. These projects and the new ones increasingly rely on machine learning (ML), deep learning (DL), and NLP methods to deal better with the vast amount and variety of data in this domain (Hürriyetoğlu et al. 2020). Automation offers scholars not only the opportunity to improve existing practices, but also to vastly expand the scope of data that can be collected and studied, thus potentially opening up new research frontiers within the field of SPEs, such as politically-motivated violence and social movements. Automated approaches, however, suffer from major issues like bias, generalizability, class imbalance, training data limitations, and ethical issues that have the potential to affect the results and their use drastically (Lau and Baldwin 2020; Bhatia et al. 2020; Chang et al. 2019).
SPEs are varied and nuanced. Both the political context and the local language used may affect whether and how they are reported. Therefore, all steps of information collection (event definition, language resources, and manual or algorithmic steps) may need to be constantly updated, leading to a series of challenging questions: Do events related to minority groups are represented well? Are new types of events covered? Are the event definitions and their operationalization comparable across systems? This workshop aims at finding answers to these questions as well. Inspiring innovative technological and scientific solutions for tackling these issues and quantifying the quality of the results.
Task Overview
================================================
The task consists of four subtasks relevant to Event Causality Identification:
Subtask 1: Document classification ⇒ Does a news article contain information about a past or ongoing event?
Subtask 2: Sentence classification ⇒ Does a sentence contain information about a past or ongoing event?
Subtask 3: Event sentence coreference identification ⇒ Which event sentences (subtask 2) are about the same event?
Subtask 4: Event extraction ⇒ What is the event trigger and what are its arguments?
Participants may design mono- or multilingual solutions that work on a single, multiple, or all subtasks concurrently. Participants are also allowed to combine annotations for either task. Additional datasets can be utilized for training or validation purposes.
The systems developed for one or more of these subtasks will be invited to process a news archive to measure the correlation between automatically and manually created event datasets. This task will be referred as Task 2. Please see Guigni et al. (2021) for the similar task we performed last year.
You can find the task repository at https://github.com/emerging-welfare/case-2022-multilingual-event, which contains sample data and scripts.
Data
================================================
Training data: The training data we use for the Task 1 is the training data for CASE-2021 and consists of English, Portuguese, and Spanish news articles. Please find the detailed description of the data on Hürriyetoğlu et al. (2021), https://aclanthology.org/2021.case-1.11.pdf.
Test data: There will be two test sets for Subtask 1. These are i) test data from CASE 2021, which is already available, and ii) new test data for CASE 2022 including new data both in existing and in new languages, e.g. Japanese, Urdu, Mandarin, and Turkish. The test data for subtasks 2, 3, and 4 will be the same as CASE 2021 test data for subtasks 2, 3, and 4 respectively.
Evaluation
================================================
The F1-macro score on predictions for test data in each language will be calculated separately for Subtasks 1 and 2. The subtask 2 will be evaluated using F1-macro. Subtask 3 will be evaluated using scorch - a python implementation of CoNLL-2012 average score for the test data ( https://github.com/LoicGrobol/scorch). Finally, we will use CoNLL-03 evaluation script (https://github.com/sighsmile/conlleval) for subtask 4. The new test data for subtask 1 may be utilized to improve performance on these subtasks.
The evaluation will be managed on a Codalab page. Participants will submit their scores and the highest performing submission will be used for ranking teams. There will be a limit on the number of submissions that can be performed. After the test deadline, an additional Codalab page will be set for additional scoring.
Participation
================================================
Please send your team name and the participation form that is on https://github.com/emerging-welfare/case-2022-multilingual-event/blob/main/C... to ali.hurriyetoglu@gmail.com. We will share the CASE-2021 data with you right away and notify you when the CASE-2022 evaluation data is ready.
Publication
================================================
All participating teams will have the opportunity to submit their system description papers to be considered for publication in the workshop proceedings published by ACL Anthology. The papers should be submitted on http://softconf.com/emnlp2022/case2022.
Organization
================================================
Ali Hürriyetoğlu, KNAW Humanities Cluster, DHLab, the Netherlands
Erdem Yörük, Koc University, Turkey
Hristo Tanev, European Commission, Joint Research Centre (EU JRC), Italy
Osman Mutlu, Koc University, Turkey
Vanni Zavarella, Italy
Reyyan Yeniterzi, Sabanci University, Turkey
Fatih Beyhan, Sabanci University, Turkey
Francielle Vargas, University of São Paulo, Brazil
Fırat Duruşan, Koc University, Turkey
Yaoyao Dai, UNC Charlotte, United States
Aaqib Javid, Koc University, Turkey
Benjamin Radford, UNC Charlotte, United States
Kalliopi Zervanou, Leiden University, the Netherlands
Milena Slavcheva, Bulgarian Academy of Sciences, Bulgaria
Niklas Stoehr, ETH Zurich, Switzerland
Guillem Ramirez, ETH Zurich, Switzerland
Shaina Raza, Public Health Ontario and University of Toronto, Canada
Farhana Ferdousi Liza (University of East Anglia, United Kingdom
Tadashi Nomoto, National Institute of Japanese Literature, Japan
Alaeddin Selçuk Gürel, Huawei, Turkey
YiJyun Lin, University of Arizona, U.S.A & National Taiwan University, Taiwan
Tiancheng Hu, ETH Zürich, Switzerland
Onur Uca, Mersin University, Turkey
Fiona Anting Tan, Institute of Data Science, National University of Singapore, Singapore
Hansi Hettiarachchi, Birmingham City University, United Kingdom
References
================================================
[1] Giorgi, S., Zavarella, V., Tanev, H., Stefanovitch, N., Hwang, S., Hettiarachchi, H., ... & Hurriyetoglu, A. (2021, January). Discovering black lives matter events in the United States: Shared task 3, In Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021) (pp. 218-227). ASSOC COMPUTATIONAL LINGUISTICS-ACL. URL: https://aclanthology.org/2021.case-1.27/
[2] Hürriyetoğlu, A., Mutlu, O., Yörük, E., Liza, F. F., Kumar, R., & Ratan, S. (2021, August). Multilingual Protest News Detection - Shared Task 1, CASE 2021. In Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021) (pp. 79-91). URL: https://aclanthology.org/2021.case-1.11/