Hello,
Could you please distribute the following job offer? Thanks.
Best,
Pascal
-------------------------------------------------------------------------------------
3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France)
We invite applications for a 3-year PhD position at the University of
Lille in the context of the recently funded research project
"COMANCHE" (Computational Models of Lexical Meaning and Change). The
position is funded by Inria, the French national research institute in
Computer Science and Applied Mathematics.
COMANCHE proposes to transfer and adapt neural word embeddings
algorithms to model the acquisition and evolution of word meaning, by
comparing them with linguistic theories on language acquisition and
language evolution. At the intersection between Natural Language
Processing, psycholinguistics and historical linguistics, this project
intends to validate or revise some of these theories, while also
developing computational models that are less data hungry and
computationally intensive as they exploit new inductive biases
inspired by these disciplines.
The first strand of the project, on which the successful candidate
will work, focuses on the development of computational models of
semantic memory and its acquisition. Two main research directions will
be pursued. On the one hand, we will compare the structural properties
associated to different semantic spaces derived from word embedding
algorithms to those found in human semantic memory as reflected in
behavioral data (such as typicality norms) as well as brain imaging
data. The latter data will then used as additional supervision to
inject more hierarchical structure into the learned semantic
spaces. One the other hand, we intend to experiment with training
regimes for word embedding algorithms that are closer to those of
humans when they acquire language, controlling the quantity as well as
the linguistic complexity of the inputs fed to the learning algorithms
through the use of longitudinal and child directed speech corpora
(e.g., CHILDES, Colaje). In both cases, both English and French data
will be considered.
The successful candidate holds a Master's degree in computational
linguistics or computer science or cognitive science and has prior
experience in word embedding models. Furthermore, the candidate will
provide strong programming skills, expertise in machine learning
approaches and is eager to work across languages.
The position is affiliated with the MAGNET team at Inria, Lille [1] as
well as with the SCALAB group at University of Lille [2] in an effort
to strenghten collaborations between these two groups, and ultimately
foster cross-fertilizations between Natural Language Processing and
Psycholinguistics.
Applications will be considered until the position is filled. However,
you are encouraged to apply early as we shall start processing the
applications as and when they are received. Applications, written in
English or French, should include a brief cover letter with research
interests and vision, a CV (including your contact address, work
experience, publications), and contact information for at least 2
referees. Applications (and questions) should be sent to Angèle
Brunellière (angele.brunelliere(a)univ-lille.fr) and Pascal Denis
(pascal.denis(a)inria.fr).
The starting date of the position is 1 October 2022 or soon
thereafter, for a total of 3 full years.
Best regards,
Angèle Brunellière and Pascal Denis
[1] https://team.inria.fr/magnet/
[2] https://scalab.univ-lille.fr/
--
Pascal
----
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.
----
+++++++++++++++++++++++++++++++++++++++++++++++
Pascal Denis
Equipe MAGNET, INRIA Lille Nord Europe
Bâtiment B, Avenue Heloïse
Parc scientifique de la Haute Borne
59650 Villeneuve d'Ascq
Tel: ++33 3 59 35 87 24
Url: http://researchers.lille.inria.fr/~pdenis/
+++++++++++++++++++++++++++++++++++++++++++++++
== 12th NLP4CALL, Tórshavn, Faroe Islands==
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other.
The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.
The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.
We welcome papers:
- that describe research directly aimed at ICALL;
- that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
- that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
- that discuss challenges and/or research agenda for ICALL
- that describe empirical studies on language learner data.
This year a special focus is given to work done on error detection/correction and feedback generation.
We encourage paper presentations and software demonstrations describing the above- mentioned themes primarily, but not exclusively, for the Nordic languages.
==Shared task==
NEW for this year is the MultiGED shared task on token-level error detection for L2 Czech, English, German, Italian and Swedish, organized by the Computational SLA working group.
For more information, please see the Shared Task website: https://github.com/spraakbanken/multiged-2023
==Invited speakers==
This year, we have the pleasure to announce two invited talks.
The first talk is given by Marije Michel from the University of Amsterdam.
The second talk is given by Pierre Lison from the Norwegian Computing Center.
==Submission information==
Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references.
We will be using the NLP4CALL template for the workshop this year. The author kit can be accessed here, alternatively on Overleaf:
<https://spraakbanken.gu.se/sites/default/files/2023/NLP4CALL%20workshop%20t…>
<https://spraakbanken.gu.se/sites/default/files/2023/nlp4call%20template.doc>
<https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…>
Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2023>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.
Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>).
==Important dates==
03 April 2023: paper submission deadline
21 April 2023: notification of acceptance
01 May 2023: camera-ready papers for publication
22 May 2023: workshop date
==Organizers==
David Alfter (1), Elena Volodina (2), Thomas François (3), Arne Jönsson (4), Evelina Rennes (4)
(1) Gothenburg Research Infrastructure for Digital Humanities, Department of Literature, History of Ideas, and Religion, University of Gothenburg, Sweden
(2) Språkbanken, Department of Swedish, Multilingualism, Language Technology, University of Gothenburg, Sweden
(3) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium
(4) Department of Computer and Information Science, Linköping University, Sweden
==Contact==
For any questions, please contact David Alfter, david.alfter(a)gu.se
For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…>
Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/>
Hi there,
Could you please distribute the following job offer? Thanks.
Best,
Pascal
-------------------------------------------------------------------------------------
We invite applications for a 3-year PhD position co-funded by Inria,
the French national research institute in Computer Science and Applied
Mathematics, and LexisNexis France, leader of legal information in
France and subsidiary of the RELX Group.
The overall objective of this project is to develop an automated
system for detecting argumentation structures in French legal
decisions, using recent machine learning-based approaches (i.e. deep
learning approaches). In the general case, these structures take the
form of a directed labeled graph, whose nodes are the elements of the
text (propositions or groups of propositions, not necessarily
contiguous) which serve as components of the argument, and edges are
relations that signal the argumentative connection between them (e.g.,
support, offensive). By revealing the argumentation structure behind
legal decisions, such a system will provide a crucial milestone
towards their detailed understanding, their use by legal
professionals, and above all contributes to greater transparency of
justice.
The main challenges and milestones of this project start with the
creation and release of a large-scale dataset of French legal
decisions annotated with argumentation structures. To minimize the
manual annotation effort, we will resort to semi-supervised and
transfer learning techniques to leverage existing argument mining
corpora, such as the European Court of Human Rights (ECHR) corpus, as
well as annotations already started by LexisNexis. Another promising
research direction, which is likely to improve over state-of-the-art
approaches, is to better model the dependencies between the different
sub-tasks (argument span detection, argument typing, etc.) instead of
learning these tasks independently. A third research avenue is to find
innovative ways to inject the domain knowledge (in particular the rich
legal ontology developed by LexisNexis) to enrich enrich the
representations used in these models. Finally, we would like to take
advantage of other discourse structures, such as coreference and
rhetorical relations, conceived as auxiliary tasks in a multi-tasking
architecture.
The successful candidate holds a Master's degree in computational
linguistics, natural language processing, machine learning, ideally
with prior experience in legal document processing and discourse
processing. Furthermore, the candidate will provide strong programming
skills, expertise in machine learning approaches and is eager to work
at the interplay between academia and industry.
The position is affiliated with the MAGNET [1], a research group at
Inria, Lille, which has expertise in Machine Learning and Natural
Language Processing, in particular Discourse Processing. The PhD
student will also work in close collaboration with the R&D team at
LexisNexis France, who will provide their expertise in the legal
domain and the data they have collected.
Applications will be considered until the position is filled. However,
you are encouraged to apply early as we shall start processing the
applications as and when they are received. Applications, written in
English or French, should include a brief cover letter with research
interests and vision, a CV (including your contact address, work
experience, publications), and contact information for at least 2
referees. Applications (and questions) should be sent to Pascal Denis
(pascal.denis(a)inria.fr).
The starting date of the position is 1 November 2022 or soon
thereafter, for a total of 3 full years.
Best regards,
Pascal Denis
[1] https://team.inria.fr/magnet/
[2] https://www.lexisnexis.fr/
--
Pascal
----
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.
----
+++++++++++++++++++++++++++++++++++++++++++++++
Pascal Denis
Equipe MAGNET, INRIA Lille Nord Europe
Bâtiment B, Avenue Heloïse
Parc scientifique de la Haute Borne
59650 Villeneuve d'Ascq
Tel: ++33 3 59 35 87 24
Url: http://researchers.lille.inria.fr/~pdenis/
+++++++++++++++++++++++++++++++++++++++++++++++
Dear colleagues,
Last month, we shared the result of our collaborative work on a core metadata scheme for learner corpora with LCR2022 participants. Our proposal builds on Granger and Paquot (2017)'s first attempt to design such a scheme and during our presentation, we explained the rationale for expanding on the initial proposal and discussed selected aspects of the revised scheme.
Our proposal is available at https://docs.google.com/spreadsheets/d/1-RbX5iUCUtCBkZU9Rfk-kv-Vzc--F-eUW2O…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…>
We firmly believe that our efforts to develop a core metadata scheme for learner corpora will only be successful to the extent that (1) the LCR community is given the opportunity to engage with our work in various ways (provide feedback on the general structure of the scheme, the list of variables that we identified as core and their operationalization; test the metadata on other learner corpora; use the scheme to start a new corpus compilation, etc.) and (2) the core metadata scheme is the result of truly collaborative work.
As mentioned at LCR2022, we will be collecting feedback on the metadata scheme until the end of October. The online feedback form is available at:
https://docs.google.com/document/d/1NeDUuxGJlPSJI9wHVA1xgGM-aV8jXTa8Qlb45K-…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…>
We'd like to thank all the colleagues who already got back to us (at LCR2022, by email or via the online form). We also thank them for their appreciation and enthusiasm for our work! We'd also like to encourage more colleagues (and particularly those of you who have experience in learner corpus compilation) to provide feedback! We need help in finalizing the core metadata scheme to make sure that it can be applied in all learner compilation contexts. In short, we need you to make sure the scheme meets the needs of the LCR community at large.
With very best wishes,
Magali Paquot (also on behalf of Alexander König, Jennifer-Carmen Frey, and Egon W. Stemle)
Reference
Granger, S. & M. Paquot (2017). Towards standardization of metadata for L2 corpora. Invited talk at the CLARIN workshop on Interoperability of Second Language Resources and Tools, 6-8 December 2017, University of Gothenburg, Sweden.
Dr. Magali Paquot
Centre for English Corpus Linguistics
Institut Langage et Communication
UCLouvain
https://perso.uclouvain.be/magali.paquot/
We are pleased to announce the inaugural offering of the Plain Language Adaptation of Biomedical Abstracts (PLABA) track, as part of the 2023 Text Analysis Conference (TAC) hosted by the U.S. National Institute of Standards and Technology (NIST). This track is an opportunity to showcase your cutting-edge research on an important topic, and to take advantage of large amounts of expert annotated data and manual evaluation.
Background: Deficits of Health Literacy are linked to worse outcomes and drive health disparities. Though unprecedented amounts of biomedical knowledge are available online, patients and caregivers face a type of “language barrier” when confronted with jargon and academic writing. Advances in language modeling have improved plain language generation, but the task of automatically and accurately adapting biomedical text for a general audience has thus far lacked high-quality, standardized benchmarks.
Task: Systems will adapt biomedical abstracts to plain language. This includes substituting medical jargon, providing explanations for necessary terms, simplifying sentences, and other modifications. The training set is the publicly available PLABA dataset<https://doi.org/10.1038%2Fs41597-022-01920-3>, which contains 750 abstracts with manual, sentence-aligned adaptations for each, totaling more than 7k sentence pairs with document context.
Evaluation: Participating systems will be evaluated on 400 held out abstracts, manually adapted four-fold by different annotators for robust automatic metrics. Additionally, a subset of system output will be manually evaluated along several axes to ensure they are accurate and faithful to the original, which is crucial for the biomedical domain.
URL: https://bionlp.nlm.nih.gov/plaba2023/
Mailing list: https://groups.google.com/g/plaba2023
Key dates:
Jul 19 – Evaluation data released
Aug 16 – Submissions due
Oct 18 – Results posted
We look forward to your submissions.
The Research Training Group 2853 “Neuroexplicit Models of Language, Vision, and Action” is looking for
6 PhD students and 1 postdoc
October 2023 or later
Neuroexplicit models combine neural and human-interpretable (“explicit”) models in order to overcome the limitations that each model class has separately. They include neurosymbolic models, which combine neural and symbolic models, but also e.g. combinations of neural and physics-based models. In the RTG, we will improve the state of the art in natural language processing (“Language”), computer vision (“Vision”), and planning and reinforcement learning (“Action”) through the use of neuroexplicit models and investigate the cross-cutting design principles of effective neuroexplicit models (“Foundations”).
The RTG is scheduled to grow to a total of 24 PhD students and one postdoc by 2025. Through the inclusion of ~20 further PhD students and postdocs funded from other sources, it will be one of the largest research centers on neuroexplicit or neurosymbolic models in the world. The RTG brings together researchers at Saarland University, the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, the CISPA Helmholtz Center for Information Security, and the German Research Center for Artificial Intelligence (DFKI). All of these institutions are colocated on the same campus in Saarbrücken, Germany.
The positions are funded as follows:
• PhD students will be funded for up to four years at the TV-L E13 100% pay scale. You should have or be about to complete an MSc degree in computer science or a related field and have demonstrated expertise in one of the research areas of the RTG, e.g. through an excellent Master’s thesis or relevant publications.
• The postdoc will initially be funded for three years, with the possibility of extension up to five years, at the TV-L E13 100% pay scale. As the RTG postdoc, you will pursue your own research agenda in the field of neuroexplicit models and work with the PhD students to identify and pursue opportunities for collaborative research. You should have or be about to complete a PhD in computer science or a related field and have demonstrated your expertise in one or more of the RTG’s research areas through publications in top venues.
The RTG is part of the Saarland Informatics Campus, one of the leading centers for research in computer science, artificial intelligence, and natural language processing in Europe. The Saarland Informatics Campus brings together 900 researchers and 2500 students from 81 countries. The CISPA Helmholtz Center, located on the same campus, is home to an additional 350 researchers and on track to grow to 800 by 2026. Researchers at SIC and CISPA are part of the ELLIS network and have been awarded more than 35 ERC grants.
Each PhD student in the RTG will be jointly supervised by two PhD advisors from the list of Principal Investigators below. Each student will freely define their own research topic; we encourage the choice of topics that cross the traditional boundaries of research fields. Students may be affiliated with Saarland University or with one of the participating institutes.
Vera Demberg, Saarland University - Computational Linguistics
Jörg Hoffmann, Saarland University - AI Planning
Eddy Ilg, Saarland University - Computer Vision, Machine Learning
Dietrich Klakow, Saarland University - Natural Language Processing
Alexander Koller, Saarland University - Computational Linguistics
Bernt Schiele, MPI for Informatics - Computer Vision, Machine Learning
Philipp Slusallek, DFKI and Saarland University - Computer Graphics, Artificial Intelligence
Christian Theobalt, MPI for Informatics - Visual Computing, Machine Learning
Mariya Toneva, MPI for Software Systems - Computational Neuroscience, Machine Learning
Isabel Valera, Saarland University - Machine Learning
Jilles Vreeken, CISPA - Machine Learning, Causality
Joachim Weickert, Saarland University - Mathematical Data Analysis
Verena Wolf, DFKI and Saarland University - Modeling and Simulation, Reinforcement Learning
Ellie Pavlick, Brown University and Google AI, will join us regularly as a Mercator Fellow.
Please send your application by 31 May 2023 to bewerbung(a)uni-saarland.de. Include the reference number W2298 for the postdoc position and the reference number W2299 for the PhD positions. We aim to conduct job interviews in July (for a start in October) and September (for a later start). The legally binding version of this job ad is at https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaf… (postdoc) and https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaf… (PhD), respectively.
For details on what materials to submit with your application and all other information about the RTG, please see our website: https://www.neuroexplicit.org/jobs/#phd-2023
Dear colleagues and friends,
This year, we are organizing the MedVidQA
<https://medvidqa.github.io/>challenge
with TRECVID 2023 <https://www-nlpir.nist.gov/projects/tv2023/index.html>.
This challenge aims at developing models for (1) retrieving the relevant
videos and locating the visual answer in those videos for the medical or
health-related question and (2) generating the medical instructional
questions from the video segments. Following the success of the 1st
MedVidQA shared task <https://aclanthology.org/2022.bionlp-1.25/>, MedVidQA
at TRECVID 2023 expanded the tasks and introduced a new track considering
language-video understanding and generation. This track is comprised of two
main tasks Video Corpus Visual Answer Localization (VCVAL) and Medical
Instructional Question Generation (MIQG).
For more details, please visit the challenge website (
https://medvidqa.github.io/) and TRECVID 2023 website (
https://www-nlpir.nist.gov/projects/tv2023/index.html).
The link for submission:
- Task 1 (VCVAL): https://codalab.lisn.upsaclay.fr/competitions/13445
<https://codalab.lisn.upsaclay.fr/competitions/13546>
- Task 2 (MIQG): https://codalab.lisn.upsaclay.fr/competitions/13546
*Important Dates*
- *Release of the training and validation datasets:* April 30, 2023
- *Release of the video corpus:* May 12, 2023
- *Release of the test sets:* July 14, 2023
- *Run submission deadline:* August 4, 2023
- *Release of the official results:* September 29, 2023
We look forward to your participation in MedVidQA at TRECVID 2023.
Join our Google Group <https://groups.google.com/g/trecvid-medvidqa2023> for
important updates! If you have any questions, ask in our Google Group
<https://groups.google.com/g/trecvid-medvidqa2023> or email
<deepak.gupta(a)nih.gov> us.
Thank you,
MedVidQA 2023 Organizers
Dear all
Just wanted to let you know that APJCR Vol. 3, No. 1 is now available to
view online.
http://icr.or.kr/ejournals-apjcr
CK
---
*CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford*
Department of English Language and Literature, Incheon National
University, *South
Korea*
Vice President | The Korea Association of Primary English Education
(KAPEE), *South Korea*
Vice President | The Korea Association of Secondary English Education
(KASEE), *South Korea*
Director | Institute for Corpus Research, Incheon National University, *South
Korea* (http://icr.or.kr)
Editor | Asia Pacific Journal of Corpus Research, ICR, *International* (
http://icr.or.kr/apjcr)
Deputy Editor | Korean Journal of English Language and Linguistics,
KASELL, *South
Korea*
Editorial Board | Corpora, Edinburgh University Press, *UK*
Editorial Board | English Today, Cambridge University Press, *UK*
E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129
H(EN): http://ckjung.org
H(KR): http://prof1.inu.ac.kr/user/ckjung
CASE-2023 Shared Task - Task 2: Collecting and Geocoding Armed Clash Events
in Russo-Ukrainian Conflict
================================================
The unprecedented quantity of easily accessible data on social, political,
and economic processes offers ground-breaking potential in guiding
data-driven analysis of socio political phenomena: Armed conflicts,
political movements, fights for economic and social rights, and various
related socio-political happenings are reported in news articles and social
media posts and recorded in curated databases. On the other hand, automatic
event detection from texts and event geocoding has long been a challenge
for the natural language processing (NLP) community. It requires
sophisticated methods and resources, such as Machine Learning models,
linguistic rules and dictionaries, geographic gazetteers.
Task definition
The task Collecting and Geocoding Armed Clash Events in Russo-Ukrainian
Conflict is being held as a sub-task of the 6th Workshop on Challenges and
Applications of Automated Extraction of Socio-political Events from Text
(CASE 2023). The task will use data from the Russo-Ukrainian Conflict to
test the capabilities of event detection systems to extract, geocode and
de-duplicate armed clashes in news and social media postsл Evaluation will
be based on the correlation between the spatio-temporal distribution and
number of the extracted events and those which are in the ground truth data
set.
We invite contributions from researchers in NLP, ML, Deep Learning, and
AI. The call is directed also towards socio-political scientists,
researchers in conflict analysis and forecasting, peace studies, and
computational social science.
All participating teams will be able to publish their system description
paper in the workshop proceedings published by ACL. For more information on
the workshop,
please visit the Workshop website https://emw.ku.edu.tr/case-2023/
<https://emw.ku.edu.tr/case-2022/> and the conference website
https://ranlp.org/ranlp2023/.
================================================
1.
Data
Gold Standard and Text Input Data for the participant systems for the time
range 24.02.2022-24.08.2022 has been prepared and will be shared with the
applicants on the Task website.
1.1 Training Data
No training data are provided for this Task. The data utilized for CASE
2023 Task 1, which is described in Hürriyetoğlu, A. et al. (2022, 2020b),
can be used for training systems for this task (Task 2). Additionally data
can be used to build systems/models that can detect protest events in
tweets and news articles.
1.2 Input Data
The participant systems will be evaluated on raw data collections including
Telegram messages, the New York Times and Ukrainian-Russian official news
channels.
Namely, the data collections comprise:
• English language social media massage and news corpus comprising.
48.007 Telegram Messages and The New York Times News about Ukraine.
• Ukrainian language social media collection comprising
102.135 Telegram Messages and Ukraine News Agency News.
• Russian language social media collection comprising
8.534 Telegram Message and Russian News Agency News
Further details on the text collections and sampling methods are provided
in the folders news and Social Media of the github repo for the Task (
https://github.com/zavavan/case2023_task2).
1.3 Gold Standard Data
The Russo-Ukrainian Conflict ground truth data primarily consists of data
coming from the Armed Conflict Location & Event Data Project (ACLED). We
will be adding alternative ground-truth datasets in order to prevent the
bias that may be introduced by using a single definition and interpretation
of an event. Full details on the manually curated data used as Gold
Standard for the correlation analysis will be disclosed at the end of the
evaluation period. Please check documentation on the folder gold_standard
of the Task github repo.
================================================
1.
Evaluation
The systems which participate in this shared task will be required to
detect news articles and Telegram posts which contain description of
ongoing armed clashes. The time and place of each armed clash should be
detected at date level (regarding the time) and precise geographic
coordinates (latitude and longitude). The systems should ideally extract
event times, based on multiple text reports.
In order to evaluate the ability of automatic event-coders to reproduce the
gold standard armed clash event dataset, we adapt two correlation methods
originally used in micro-level analysis of political violence by Hammond
and Weidmann (2014), based on aggregation of event counts uniform grid
geographical cells and 1-day time spans and apply a number of standard
correlation coefficients and error measures.
For each of the input text corpora in1.2, each participant may submit up to
3 different system responses. Each system response will consist of a csv
file with the following naming pattern:
“submission.<team-name>.<corpus>.<response-number>.csv”
where <corpus> is either “social_media” or “news”.
For instance: “submission.MyTeam.news.3.csv” for the 3rd submission of team
“MyTeam” on the news corpus.
Each system response file will have one line per event, where each line
will have the following format:
<id>,<City>,<Region>,<Country>,<Date>
where <id> is a numerical event identifier, <City>,<Region>,<Country> are
canonical English names of the City,State/Region and Country, respectively,
of the detected event location. While only the <country> attribute is
mandatory, systems are expected to assign a description of the event
location at the finest grained level possible, as otherwise geographical
coordinate conversion may penalize the correlation score on geographical
cell aggregation. <Date> is the assigned date of the event in the format
YYYY-MM-DD.
A sample system response file line:
0,Kharkiv,Kharkiv Oblast,Ukraine,2022-05-02
A sample system output file can be downloaded from the Task repo at:
https://github.com/zavavan/case2023_task2/blob/main/submission.myteam.news.…
Important Dates (AoE time)
================================================
It is optional to use Task 1 systems. Participants may also use their own
systems, which are developed independently of Task 1.
Task 1 Training data available: May 1, 2023
Task 1 Test data available: May 15, 2023
Task 1 Evaluation period ends: June 30, 2023
Task 2 Sample Text archive is available: May 22, 2023
Task 2 Text archive for evaluation is available: July 1, 2023
Task 2 Evaluation period starts: July 1, 2023
Task 2 Evaluation period ends: July 24
System Description Paper submissions due: July 31, 2023
Notification to authors after review: August 7, 2023
Camera ready: August 25, 2023
Workshop period @ RANLP: Sep 7-8, 2023
Organization
================================================
-
Hristo Tanev (Joint Research Centre (JRC), European Commission, Italy)
-
Onur Uca, Sociology (Sociology, Mersin University, Turkey)
-
Vanni Zavarella (University of Cagliari, Italy)
-
Ali Hürriyetoğlu (KNAW Humanities Cluster DHLab, the Netherlands)
Please contact the organizers at hristo.tanev(a)ec.europa.eu or
onuruca(a)mersin.edu.tr for your questions.
5.References
Jesse Hammond and Nils B Weidmann. Using machine-coded event data for the
micro-level study of political violence. Research & Politics,
1(2):2053168014539924, 2014.
Hürriyetoğlu, A., Mutlu, O., Duruşan, F., Uca, O,. Gürel, A.,S.,
Radford, B., Dai, Y., Hettiarachchi, H., Stoehr, N., Nomoto, T., Slavcheva,
M., Vargas, F., Javid, A., Beyhan, F., Yörük, E. (2022). Extended
Multilingual Protest News Detection Shared Task1,CASE2021 and 2022. arXiv
preprint arXiv:2211.11360. Url: https://arxiv.org/abs/2211.11360
Hürriyetoğlu, A., Yörük, E., Yüret, D., Mutlu, O., Yoltar, Ç., Duruşan, F.,
& Gürel, B. (2020b). Cross-context news corpus for protest events related
knowledge base construction. arXiv preprint arXiv:2008.00351. In Automated
Knowledge Base Construction (AKBC). URL:
https://www.akbc.ws/2020/papers/7NZkNhLCjp
Call for workshop papers and Shared Task participation: the 6th workshop on
Challenges and Applications of Automated Extraction of Socio-political
Events from Text - CASE @ RANLP 2023
************************************************************************************
URL: https://emw.ku.edu.tr/case-2023/
Paper submission deadline: 10 July 2023
Paper acceptance notification: 5 August 2023
Paper camera-ready: 25 August 2023
Workshop dates: 7-8 September 2023
Dates and deadlines for the shared task are below.
Softconf page of the workshop: https://softconf.com/ranlp23/CASE/
************************************************************************************
We invite contributions from researchers in computer science, NLP, ML, DL,
AI, socio-political sciences, conflict analysis and forecasting, peace
studies, as well as computational social science scholars involved in the
collection and utilization of socio-political event data. This includes
(but is not limited to) the following topics
1) Extracting events and their arguments such as time and location in and
beyond a sentence or document, event coreference resolution.
2) Research in NLP technologies in relation to event detection: geocoding,
temporal reasoning, argument structure detection, syntactic and semantic
analysis of event structures, text classification, for event type
detection, learning event-related lexica, event co-reference resolution,
fake news analysis, and others with a focus on real or potential event
detection applications.
3) New datasets, training data collection, and annotation for event
information.
4) Event-event relations, e.g., subevents, main events, spatio-temporal
relations, causal relations.
5) Event dataset evaluation in light of reliability and validity metrics.
6) Defining, populating, and facilitating event schemas and ontologies.
7) Automated tools and pipelines for event collection related tasks.
8) Lexical, syntactic, semantic, discursive, and pragmatic aspects of event
manifestation.
9) Methodologies for development, evaluation, and analysis of event
datasets.
10) Applications of event databases, e.g. early warning, conflict
prediction, policymaking.
11) Estimating what is missing in event datasets using internal and
external information.
12) Detection of new and emerging SPE types, e.g. creative protests.
13) Release of new event datasets.
14) Bias and fairness of the sources and event datasets.
15) Ethics, misinformation, privacy, and fairness concerns pertaining to
event datasets.
16) Copyright issues on event dataset creation, dissemination, and sharing.
17) Cross-lingual, multilingual and multimodal aspects in event analysis.
18) Resources and approaches related to contentious politics around climate
change.
**** Shared tasks ****
Please check the workshop page and Github repositories of the respective
task for additional details.
Task 1 - Multilingual protest news detection:
The performance of an automated system depends on the target event type as
it may be broad or potentially the event trigger(s) can be ambiguous. The
context of the trigger occurrence may need to be handled as well. For
instance, the ‘protest’ event type may be synonymous with ‘demonstration’
or not in a specific context. Moreover, hypothetical cases such as future
protest plans may need to be excluded from the results. Finally, the
relevance of a protest depends on the actors as in a contentious political
event only citizen-led events are in the scope. This challenge becomes even
harder in a cross-lingual and zero-shot setting in case training data are
not available in new languages. We tackle the task in four steps and hope
state-of-the-art approaches will yield optimal results.
Contact person: Ali Hürriyetoğlu (ali.hurriyetoglu(a)gmail.com)
Github: https://github.com/emerging-welfare/case-2022-multilingual-event
Task 2 - Collecting and Geocoding Armed Clash Events in Russian Ukrainian
Conflict:
There is a mismatch between the event information collected between
automated and manual approaches. We aim at identifying similarities and
differences between the results of these paradigms for creating event
datasets. The participants of Task 1 will be invited to run the systems
they will develop to tackle Task 1 on a text archive. Participation in Task
1 is not a precondition to participate in Task 2.
Contact person: Hristo Tanev (htanev(a)gmail.com) and Onur Uca (
onuruca(a)mersin.edu.tr)
Github: https://github.com/zavavan/case2023_task2
Task 3 - Event causality identification:
Causality is a core cognitive concept and appears in many natural language
processing (NLP) works that aim to tackle inference and understanding. We
are interested in studying event causality in news, and therefore,
introduce the Causal News Corpus. The Causal News Corpus consists of 3,767
event sentences, extracted from protest event news, that have been
annotated with sequence labels on whether it contains causal relations or
not. Subsequently, causal sentences are also annotated with Cause, Effect
and Signal spans. Our subtasks work on the Causal News Corpus, and we hope
that accurate, automated solutions may be proposed for the detection and
extraction of causal events in news.
Contact person: Fiona Anting Tan (tan.f(a)u.nus.edu)
Github: https://github.com/tanfiona/CausalNewsCorpus
Task 4 - Multimodal Hate Speech Event Detection:
Hate speech detection is one of the most important aspects of event
identification during political events like invasions. In the case of hate
speech detection, the event is the occurrence of hate speech, the entity is
the target of the hate speech, and the relationship is the connection
between the two. Since multimodal content is widely prevalent across the
internet, the detection of hate speech in text-embedded images is very
important. Given a text-embedded image, this task aims to automatically
identify the hate speech and its targets. This task will have two subtasks.
Contact person: Surendrabikram Thapa (surendrabikram(a)vt.edu)
Github: https://github.com/therealthapa/case2023_task4
**** Deadlines for the Shared tasks ****
** Task 1, 3, 4:
Training & Validation data available: May 1, 2023
Test data available: Jun 15, 2023
Test start: Jun 15, 2023
Test end: Jun 30, 2023
System Description Paper submissions due: Jul 10, 2023
Notification to authors after review: Aug 5, 2023
Camera ready: Aug 25, 2023
** Task 2:
Sample Text archive is available: May 22, 2023
Text archive for evaluation is available: July 1, 2023
Evaluation period starts: July 1, 2023
Evaluation period ends: July 24, 2023
System Description Paper submissions due: July 31, 2023
Notification to authors after review: August 7, 2023
Camera ready: August 25, 2023
*** Keynotes ***
We will continue our tradition of inviting keynote speakers from both
social and computational sciences. The social science keynote will be
delivered by Erdem Yörük with the title “Using Automated Text Processing to
Understand Social Movements and Human Behaviour” and the computational ones
will be delivered by Ruslan Mitkov and Kiril Simov.
Please see the workshop webpage (https://emw.ku.edu.tr/case-2023/) for
additional details.