Shared Task on Offline Harm Potential Identification (HarmPot-ID)
Fourth Workshop on Threat, Aggression and Cyberbullying
May 20, 2024
Lingotto Conference Centre - Torino (Italia)
@ LREC-COLING 2024
*Task Website:* https://codalab.lisn.upsaclay.fr/competitions/17646
*Workshop Website:* https://sites.google.com/view/trac2024
====================================
Call for Participation
===================================
We are happy to announce that the Fourth Workshop on Threat, Aggression and
Cyberbullying will be co-located with LREC-COLING 2024 on May 20, 2024.
TRAC-2024 is introducing the novel task of predicting the offline harm
potential of social media posts - broadly the task is to predict whether a
specific post is likely to initiate, incite or further exaggerate an
offline harm event (viz. riots, mob lynching, murder, rape, etc). It will
consist of two sub-tasks. -
Sub-task 1a: What is the offline harm potential of a document?
This will be a four-class classification task where the participants will
be required to predict the level of offline harm potential -
0 (it will never lead to offline harm, in any context),
1 (it could lead to incite an offline harm event given specific
conditions or context),
2 (it is most likely to incite in most contexts or probably initiate an
offline harm event in specific contexts)
3 (it is certainly going to incite or initiate an offline harm event in
any context).
Sub-task 1b: Who is/are the most likely target(s) of the offline harm?
If an offline harm event is triggered, who are going to be the most
affected groups of people? In this task, only the broad category of
identities of the target(s) are to be predicted. It will be a five-class
classification task -
Gender
Religion
Descent
Caste
Political Ideology
Important Dates
==============
Training Set Release: February 7, 2024
Test Set Release: March 10, 2024
Submission due: March 13, 2024
System Description Papers due: March 21, 2024
Reviews for papers: March 27, 2024
Camera-ready due: March 31, 2024
[apologies for cross-posting]
We invite applications for the Adam Kilgarriff Prize.
Full information for potential applicants can be found here :
https://kilgarriff.co.uk/prize/category/news/.
The deadline for applications is *30th September 2024*. A winner will be
announced on or before *31st December 2024*, and the Prize will be
awarded at the eLex Conference of 2025 <http://elex.link/>.
This is the fifth iteration of the Adam Kilgarriff Prize, which has so
far had four excellent winners <https://kilgarriff.co.uk/prize/winners/>.
We look forward to receiving your applications!
Michael Rundell, Chair of Trustees, Adam Kilgarriff Prize
The deadline for the 7th edition of the Translation in Transition conference (https://sites.google.com/view/tt2024) has been extended to Feb 23, 2024.
This series of conferences has established itself as a central meeting point for researchers in the field of empirical translation studies through previous editions in Copenhagen, Germersheim, Ghent, Barcelona, Kent and Prague. In its 7th edition held at the Shota Rustaveli State University in Batumi it once again wants to be a forum of discussion for empirical research that is based on any kind of empirical methodology and that advances our knowledge in the fields of translation and interpreting. While the Batumi edition will be open to various topics within empirical translation studies, we also want to put special emphasis on two directions: low-resourced and less-researched language pairs, as well as an interplay between different methods and data types, e.g. combining product and process research.
*Final Call for Papers*
We invite original submissions that deal with any of the conference topics. To encourage a fruitful exchange of ideas and experience among the researchers of various fields of specialization, preference will be given to interdisciplinary contributions that cover two or more of the conference topics.
The submissions are to be made in the form of anonymized extended abstracts that should be between 800 and 1000 words long (excluding references) by February 16, 2024. Apart from a clear outline of the aims and methods of the study, the abstracts should also provide (preliminary) results. The abstracts will be submitted through the open review system (https://openreview.net/group?id=TT/2024/Conference) and reviewed by at least two members of the scientific committee. The accepted contributions will be presented either as oral talks or as posters. All submissions must follow the abstract submission instructions (https://sites.google.com/view/tt2024/submission-instructions).
We welcome contributions (in English) grounded in empirical approaches to studying both interlingual and intralingual translation, as well as theoretical and position papers on the following topics:
* Empirical methods and models (corpus-based, corpus-driven, experimental) or methods derived from computational linguistics and data mining (e.g. computational semantics, pragmatics) applied to translation studies * Presentation of new resources for translation studies (spoken corpora, multimodal corpora, interpreting transcript datasets, corpora of low-resourced languages, lexicons, databases, etc.) * Method and data triangulation: combined use of corpus data and methods and other sources of data * Detection and analysis of specific features of translation (translationese, interpretese, editese, machine translationese, post-editese, etc.) using parallel and comparable corpora * Analysis and interpretation of variation in translation, e.g. variation driven through register/genre, expertise, mode, etc. * Empirical analysis of specialised translation, e.g. legal translation, technical translation and others * Analysis of non-canonical forms of translation/interpreting and multilingual communication * Cognitive and computational insights of variation in translation and translationese * Cognitive modeling of translation processes, including cognitive load measurements * Translation quality assessment and evaluation using corpora or experimental research * Translation in specific settings: between close languages, from a third language, non-native translation, indirect/relay translation, etc. * The use of corpora in translator and/or interpreter training * Improving understanding of translation in the context of NLP * Computer-assisted translation and/or interpreting (CAT/CAI) * Machine translation (MT): analysis, evaluation, selection and preparation of data for MT, ‘machine translationese’Important dates
· Conference abstract submission due: Feb 23, 2024
· Notification of acceptance: April 8, 2024
· Final abstract version due: April 29, 2024
· Registration open: May 6, 2024
· Early-bird registration: June 6, 2024
· Conference date: September 23-25, 2024
The conference is organized by the Department of European Studies, Faculty of Humanities, Batumi Shota Rustaveli State University in Batumi (Georgia) in cooperation with the Institute of Translation Studies and Specialised Communication, University of Hildesheim (Germany).
Local organizing committee at the Batumi Shota Rustaveli State University
Khatuna Beridze, Theona Beridze, Khatuna Diasamidze, Tamta Nagervadze
Program Chairs at the University of Hildesheim
Ekaterina Lapshinova-Koltunski and Silvana Deilen
Dear all,
We are pleased to announce the schedule and lineup of speakers & tutors for the UCREL NLP Summer School 2024! The school will be led by 16 experts, covering a wide range of NLP talks and hands-on tutorials. We are also introducing a mini team-based hackathon and a poster session. There will be plenty of time for knowledge exchange and discussions both within the sessions and during breaks.
Upon request, we have extended the early bird registration until April 1, 2024. Please note that registrations are processed on a first-come, first-served basis, and we are offering in-person sessions only, therefore spaces are limited.
- Date: 24-26 July 2024
- Venue: InfoLab21, School of Computing and Communications, Lancaster University, UK.
- Registration: https://bit.ly/UCREL2024
- Schedule and speakers: https://ucrel.lancs.ac.uk/uss2024
Registered applicants who plan to attend the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS 2024) https://www.nlpaics.com/, which takes place at Lancaster University just two days after the summer school, will receive a 20% discount on your NLPAICS conference registration fees.
For any questions, please email us at ucrel(a)lancaster.ac.uk
Best wishes,
Mo
--------------------------------
Dr Mo El-Haj
Senior Lecturer in NLP
Director of Admissions (SCC)
Co-Director of UCREL NLP Group https://ucrel.lancs.ac.uk/
Natural Language Engineering (NLE) Journal Editorial Board
https://www.cambridge.org/core/journals/natural-language-engineering
Advisory Board of the Natural Language Processing Book Series
https://benjamins.com/catalog/nlp
School of Computing and Communications, Lancaster University
https://www.lancaster.ac.uk/staff/elhaj
@DocElhaj
You may receive emails from me outside what are your typical office hours.
I do not expect you to respond to my email outside your working hours.
> CALL FOR PAPERS
>
> SLATE - Symposium on Languages, Applications and Technologies
>
> Águeda, Portugal, July 4-5, 2024
>
> https://slate-conf.org/2024/home
>
>
>
> IMPORTANT DATES
>
> Paper Submission Deadline: April 25, 2024
>
> Paper Authors' Notification: May 24, 2024
>
> Final Paper Submission: May 31, 2024
>
> Conference Date: July 4-5, 2024
>
>
>
> Context
>
> We often use languages. Earlier, to communicate between ourselves. Later, to communicate with computers. And more recently, with the advent of networks, we found a way to make computers communicate between themselves. All these different forms of communication use languages, different languages, but they still share many similarities. In SLATE, we are interested in discussing these languages, organised in three main tracks:
>
>
>
> - HHL Track: Processing Human-Human Languages, dedicated to the presentation and discussion of Natural Language Processing (NLP) tools and applications.
>
> - HCL Track: Processing Human-Computer Languages, where researchers, developers, and educators exchange ideas and information on the latest academic or industrial work on the design, processing, assessment, and applications of programming languages.
>
> - CCL Track: Processing Computer-Computer Languages, broad space for discussing (mark-up) languages for communication between computers, including those used for visualisation and presentation of information to the end-user
>
> List of topics
>
> * Human-Human Languages (HHL) track:
>
> - Computational approaches to Morphology, Syntax, and Semantics;
>
> - Machine translation and tools for Computer Assisted Translation;
>
> - Computational terminology and lexicography;
>
> - Information Retrieval and Automatic Question Answering;
>
> - Information Extraction;
>
> - Natural Language Understanding;
>
> - Corpus Linguistics;
>
> - Statistical Methods for NLP;
>
> - Tools and resources for NLP;
>
> - Natural Language Generation;
>
> - Speech Recognition and Synthesis;
>
> - NLP system and resource evaluation;
>
> - Language teaching support tools.
>
>
>
> * Human-Computer Languages (HCL) track:
>
> - Programming language concepts, methodologies and tools;
>
> - Language and Grammars, design, formal specification and quality;
>
> - Domain Specific Languages design and implementation;
>
> - Programming, refactoring and debugging environments;
>
> - Dynamic and static analysis of programs;
>
> - Program Comprehension and program visualization;
>
> - Compilation and interpretation techniques;
>
> - Code generation and optimization;
>
> - Programming languages teaching methods and tools;
>
> - Cross-fertilization of different technological spaces (modelware, grammarware, ontologies, etc);
>
> - High level visual languages for Low-code development.
>
>
>
> * Computer-Computer Languages (CCL) track:
>
> - Semantic data description frameworks;
>
> - Semantic Web languages;
>
> - Ontology engineering;
>
> - IoT data protocols;
>
> - XML Databases and Big Data;
>
> - Publishing and document storage formats;
>
> - HTML5 and web formatting;
>
> - Industry-specific XML based standards;
>
> - Web APIs and service marketplaces;
>
> - Service-Oriented Architectures;
>
> - E-learning systems, standards, and interoperability;
>
> - Data and graph visualization languages.
>
> For any more specific information regarding publication policy, committees or how to get to the venue please visit our website: [https://slate-conf.org/2024/home.](https://slate-conf.org/2024/home)
>
> Kindest Regards,
>
> SLATE'24 Organization Committee
https://gu-clasp.github.io/MILLing/
*Multimodality and Interaction in Language Learning (MILLing)* will
bring together researchers in linguistics and computational
linguistics to discuss learning through linguistic interaction, from
the perspectives of both human language acquisition and machine
learning. We encourage contributions from the fields of theoretical linguistics,
experimental linguistics, pragmatics, computational linguistics,
artificial intelligence, and cognitive science.
The conference is organised by the Centre for Linguistic Theory and
Studies in Probability (CLASP, <https://gu-clasp.github.io/>),
University of Gothenburg. The conference will be held between October
14 and 15 in Gothenburg, Sweden.
Important dates
----
- Submission deadline: May 31, 2024, anywhere on Earth
- Notification of acceptance: Aug 30, 2024, anywhere on Earth
- Camera ready: Sep 20, 2024, anywhere on Earth
- Conference: Oct 14--15, 2024, University of Gothenburg, Sweden
Topics of interest
----
We hope to see innovative work that
considers language learning from different perspectives, and we hope
to cultivate discussion that reaches across traditionally disparate
disciplines. Papers are invited on topics in these and closely related
areas, including (but not limited to) the following:
- Language acquisition: formal, statistical, experimental, and machine learning-based work
- Language learning through dialogue in humans and machines
- Multi-modality and figurativeness in language learning and dialogue
- Linguistic variation, adaptation, and audience design
- Low-resource and ecologically plausible language modelling (e.g., BabyLM)
- Cognitive architectures for language learning
- Information state update in humans and machines
- Cognitive aproaches to second language acquisition
- Dialogue systems for language learning
- Online, reinforcement and curriculum learning in NLP
- Atypical development and language learning
- Ethical considerations in AI-assisted language learning
Submission Requirements
----
MILLing will feature two types of submissions: long papers and short
papers. Long papers must describe original research, and they must not
exceed 8 pages excluding references (position papers are also accepted
and should be formatted in the same way). Short papers present work in
progress, or they describe systems and/or projects. They must not
exceed 4 pages excluding references. All types of papers will be
published in the 2024 ACL Anthology as a CLASP Conference Proceedings.
Papers should be electronically submitted via the softconf system at:
<https://softconf.com/n/MILLing2024/>. Submissions should be PDF files
and use the LaTeX or Word templates provided for ACL submissions
(<https://github.com/acl-org/acl-style-files>). Submissions have to be
anonymous. Please make sure that you select the right track when
submitting your paper. Contact the organisers if you have problems
using softconf.
Concurrent Submissions
----
Papers that have been or will be submitted to other conferences or
publications must indicate this at submission time using a footnote on
the title page of the submissions. Authors of papers accepted for
presentation at MILLing must notify the program chairs by the
camera-ready deadline as to whether the paper will be presented. All
accepted papers must be presented at the conference to appear in the
proceedings. We will not accept publications or presentation papers
that overlap significantly in content or results with papers that will
be (or have been) published elsewhere.
Camera Ready Versions
----
Camera ready versions should follow the same guidelines with respect
to style and page numbers as the initial submission, i.e. there are no
additional pages allowed in the final submission. Please submit the
camera ready version by Sep 20, 2024.
About CLASP
----
MILLing is organised by the Centre for Linguistic Theory and Studies
in Probability (CLASP, <https://gu-clasp.github.io/>) at the Department
of Philosophy, Linguistics and Theory of Science (FLoV), University of
Gothenburg. CLASP focuses its research on the application of
probabilistic and information theoretic methods to the analysis of
natural language. CLASP is concerned both with understanding the
cognitive foundations of language and developing efficient language
technology. We work at the interface of computational
linguistics/natural language processing, theoretical linguistics, and
cognitive science.
=======2 PhD positions on NLP at CNRS@CREATE Singapore ===========
CNRS@CREATE Singapore, the first CNRS’ overseas subsidiary, has 2 PhD
offer positions in hybrid strategies for NLP. The candidate will work
within the DesCartes program
(https://www.cnrsatcreate.cnrs.fr/descartes/), a large research project
that aims to develop disruptive hybrid AI to serve the smart city and to
enable optimized decision-making in complex situations, encountered for
critical urban systems.
We are looking for candidates with:
→ Master degree in Computer science or equivalent with solid background
in NLP, AI and/or machine learning. Very strong academic records are
highly recommended.
→ Good experience in deep learning approaches for NLP
→ Good programming skills in Python
→ Very good English skills (both writing and speaking)
→ Can work collaboratively with other researchers
The candidate will be registered at Paul Sabatier University-Toulouse
for 3 years and is expected to spend time in Singapore
(https://www.cnrsatcreate.cnrs.fr/about-us/). The thesis will be
supervised by Jian Su (A*STAR Institute for Infocomm Research), and
Farah Benamara (IRIT, Toulouse University and IPAL Singapore).
To apply, please send a detailed CV, your grades and a list of
publications if any. The position is open until fulfilled but the
deadline to apply is April 15th, for a start on September/October 2024.
Feel free to contact us for any questions: farah.benamara(a)irit.fr
--
========================
Farah Benamara Zitoune
Professor in Computer Science, Université Paul Sabatier
IRIT-CNRS
118 Route de Narbonne, 31062, Toulouse.
Tel : +33 5 61 55 77 06
http://www.irit.fr/~Farah.Benamara
==================================
Processing of figurative language is a rapidly growing area in NLP, including computational modeling of metaphors, idioms, puns, irony, sarcasm, simile, and other figures. Characteristic to all areas of human activity (from poetic, ordinary, scientific, social media) and, thus, to all types of discourse, figurative language becomes an important problem for NLP systems. Its ubiquity in language has been established in a number of corpus studies and the role it plays in human reasoning has been confirmed in psychological experiments. This makes figurative language an important research area for computational and cognitive linguistics, and its automatic identification, interpretation and generation indispensable for any semantics-oriented NLP application.
The proposed workshop will be the fourth edition of the biennial Workshop on Figurative Language Processing, whose first editions were held at NAACL 2018, ACL 2020 and EMNLP 2022, respectively. The workshop builds upon a long series of related workshops that the current organizers have been involved with: “Metaphor in NLP” series (2013-2016) and “Computational Approaches to Linguistic Creativity” series (2009-2010). We expand the scope to incorporate various types of figurative language, with the aim of maintaining and nourishing a community of NLP researchers interested in this topic. The main focus will be on computational modeling of figurative language, however papers on cognitive, linguistic, social, rhetorical, and applied aspects are also of interest, provided that they are presented within a computational, formal, or a quantitative framework. Recent advancement in language models have led to several works on figurative language understanding (Chakrabarty et al 2022a; Chakrabarty et al 2022b; Liu et al 2022; Hu et al 2023) and generation (Stowe et al 2021; Chakrabarty et al 2021; Sun et al 2022; Tian et al 2021) At the same time large language models have opened up opportunities to utilize figurative language in scientific (Kim et al 2023) as well as creative writing (Chakrabarty et al 2022c; Tian et al 2022). Additionally there have also been recent work on multimodal figurative language generation (Chakrabarty et al 2023; Akula et al 2023), understanding (Hessel et al 2023; Yosef et al 2023) and interpretation (Hwang et al 2023; Desai et al 2022; Kumar et al 2022). We encourage submissions along these axes.
Topics of Interest
The workshop will solicit both full papers and short papers for either oral or poster presentation. Topics will include, but will not be limited to, the following:
Identification and interpretation of different types of figurative language: Linguistic, conceptual and extended metaphor; irony, sarcasm, puns, simile, metonymy, personification, synecdoche, hyperbole
Generation of different types of figurative language: sarcasm, simile, metaphors, humor, hyperbole
Multilingual and multimodal figurative language processing
Resources and evaluation
Annotation of figurative language in corpora
Datasets for evaluation of tools
Evaluation methodologies
Figurative use in low-resource languages
Processing of figurative language for NLP applications
Figurative language in sentiment analysis; dialogue systems; computational social science; educational applications
Figurative language and mental health
Figurative language in digital humanities
Figurative language in creative writing
Figurative language and cognition
Cognitive models of processing of figurative language by the human brain
Human-AI collaboration for figurative language
Shared Tasks
Multilingual euphemisms detection: Euphemisms are a linguistic device used to soften or neutralize language that may otherwise be harsh or awkward to state directly (e.g. "between jobs" instead of "unemployed", "late" instead of "dead", "collateral damage" instead of "war-related civilian deaths"). By acting as alternative words or phrases, euphemisms are used in everyday language to maintain politeness, mitigate discomfort, or conceal the truth. While they are culturally-dependent, the need to discuss sensitive topics in a non-offensive way is universal, suggesting similarities in the way euphemisms are used across languages and cultures. We propose a shared task in which participants will need to disambiguate sentences in multiple languages as either euphemistic or not. The dataset will include English, Mandarin, Spanish, Yoruba, and possibly additional languages.
Understanding of Figurative Language through Visual Entailment: One important modality that has gained interest recently is vision, namely the interpretation of figurative language in media such as memes, art, or comics. This task is challenging because it involves reasoning abstractly about images, and also involves understanding social commonsense and cultural context. We will frame this as a visual entailment task where a model not only has to predict if a caption entails the content in the image but also provide free text explanations justifying the label prediction. These tasks have proved difficult for state-of-the-art multimodal models in the past. We will have a paper and a baseline for the same.
Important Dates
Long, Short & Demonstration Paper Submission: March 10th, 2024
Long, Short & Demonstration Paper Notification: April 14th, 2024
Final Paper Submission: April 24th, 2024
Workshop: June 21/22, 2024
For more information, please check https://sites.google.com/view/figlang2024
In this newsletter:
LDC membership discounts expire March 1
Spring 2024 data scholarship recipients
Four corpora withdrawn from the LDC Catalog
New publications:
Second Language University Speech Intelligibility Corpus<https://catalog.ldc.upenn.edu/LDC2024S02>
AIDA Scenario 1 Practice Topic Annotation<https://catalog.ldc.upenn.edu/LDC2024T02>
________________________________
LDC membership discounts expire March 1
Time is running out to save on 2024 membership fees. Renew your LDC membership, rejoin the Consortium, or become a new member by March 1 to receive a discount of up to 10%. For more information on membership benefits and options, visit Join LDC<https://www.ldc.upenn.edu/members/join-ldc>.
Spring 2024 data scholarship recipients
Congratulations to the recipients of LDC's Spring 2024 data scholarships:
Jordan Chandler: Université Rennes 2 (France): Master's student, English Studies. Jordan is awarded a copy of Penn Parsed Corpora of Historical English LDC2020T16 to continue his research on the historical development of adjective, quantifier, and article indefiniteness in the English language.
Nikhil Raghav: TCG Crest (India): PhD candidate, Institute for Advancing Intelligence. Nikhil is awarded copies of Third DIHARD Challenge Development LDC2022S12 and Third DIHARD Challenge Evaluation LDC2022S14 for his work in speaker diarization.
Abraham Sanders: Rensselaer Polytechnical Institute (USA): PhD candidate, Cognitive Science. Abraham is awarded copies of Fisher English Training Speech Part 1 Speech LDC2004S13, Fisher English Training Speech Part 1 Transcripts LDC2004T19, Fisher English Training Part 2 Speech LDC2005S13 and Fisher English Training Part 2 Transcripts LDC2005T19, for his work in spoken dialogue systems.
The next round of applications will be accepted in September 2024. For information about the program, visit the Data Scholarships page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.
Four corpora withdrawn from the LDC Catalog
We regret to announce that The New York Times Annotated Corpus, LDC2008T19, has been withdrawn from the LDC Catalog by the data provider. Because they contain data from LDC2008T19, the following three corpora are also withdrawn from the Catalog: Benchmarks for Open Relation Extraction LDC2014T27, Concretely Annotated New York Times LDC2018T12, and News Sub-domain Named Entity Recognition LDC2023T12. Organizations and individuals who have previously licensed any of these data sets can continue to use them under the terms of their respective special license agreements.
________________________________
New publications:
Second Language University Speech Intelligibility Corpus<https://catalog.ldc.upenn.edu/LDC2024S02> was developed by Northern Arizona University, The Pennsylvania State University, and The University of Texas at Dallas. It contains 10.5 hours of English speech collected from 66 international faculty and university students representing 15 language backgrounds at 10 North American universities. This release also includes orthographic transcriptions for all recordings, intelligibility scores for 73% of the files, speaker metadata, and aligned Praat textgrids.
The speech data is comprised of presentations, descriptions, reflections, and microteaching tasks. Speakers were recruited from courses at intensive English programs and oral skills courses for international graduate students seeking to become international teaching assistants.
2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
*
AIDA Scenario 1 Practice Topic Annotation<https://catalog.ldc.upenn.edu/LDC2024T02> was developed by LDC and is comprised of annotations for 212 English, Russian, and Ukrainian web documents (text, image, and video) from AIDA Scenario 1 Practice Topic Source Data (LDC2023T11)<https://catalog.ldc.upenn.edu/LDC2023T11>, specifically, the set of practice documents designated for annotation in Phase 1.
Annotations are presented as tab separated files in the following categories for each topic:
* Mentions: single references in source data to a real-world entity or filler, event, or relation.
* Slots: pre-defined roles in an event or relation filled by an argument (entity mention).
* Linking: entity mentions linked to entries in the knowledge base as a method of indicating the real-world entity to which an entity referred.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
[apologies if you receive multiple copies of this call]
Dear colleagues and friends,
*We are pleased to release the 1st Call for Participation - CLEF 2024
SimpleText Task4: SOTA?*
*Overview:* SOTA? is introduced as Task 4 in the SimpleText track of CLEF
2024. The goal of the SOTA? shared task is to develop systems which given
the full text of an AI paper, are capable of recognizing whether an
incoming AI paper indeed reports model scores on benchmark datasets, and if
so, to extract all pertinent (Task, Dataset, Metric, Score) quadruples
presented within the paper.
More info on the task website:
https://sites.google.com/view/simpletext-sota/home
SOTA? will be divided into two evaluation phases:
- Evaluation Phase 1: Few-shot Testing;
- Evaluation Phase 2: Zero-shot Testing
*To participate in SOTA? i.e. SimpleText Task 4 @ CLEF 2024, please
register your team*:
1. CLEF 2024 official registration page
https://clef2024.imag.fr/index.php?page=Pages/registration.html
2. Codalab competition site:
https://codalab.lisn.upsaclay.fr/competitions/16616
Note, SOTA? is organized as a new task this year under the "SimpleText -
Improving Access to Scientific Texts for Everyone" initiative
https://simpletext-project.com/. Please take a look at the other 3 tasks,
i.e. Task 1, 2, and 3, offered by SimpleText and select one or more of
those task options too if you are interested. Note that there is no
interdependence of the dataset between "Task 4 - SOTA?" and the other three
tasks of SimpleText.
*Dates*
Training and validation datasets available: Feb 1, 2024
Test data available/Evaluation starts: April 23, 2024
Evaluation ends: May 3, 2024
Participant paper submissions due: May 31, 2024
Notification to authors: June 24, 2024
Camera ready due: July 8, 2024
CLEF 2024 Workshop, Grenoble, France: 9-12 September 2024
*Task Organizers*
Jennifer D’Souza (TIB Leibniz Information Centre for Science and Technology
- Germany)
Salomon Kabongo (L3S Research Center, Germany)
Hamed Babaei Giglou (TIB Leibniz Information Centre for Science and
Technology - Germany)
Yue Zhang (Berlin Technical University, Germany)
Sören Auer (TIB Leibniz Information Centre for Science and Technology -
Germany)
*We look forward to having you on board!*
*Contact:* sota.task [at] gmail.com