CFP: 1st Shared Task on Spoken-Query Cross-Lingual Information Retrieval for the Indic Languages (SqCLIR 2024) - Corpora

27 Aug 2024


      Apologies for the multiple postings.
-----------------------------
Spoken-Query Cross-Lingual Information Retrieval for the Indic Languages
(SqCLIR 2024)
Website: https://sites.google.com/view/sqclir2024
To be organized in conjunction with FIRE 2024 (fire.irsi.org.in)
12th-15th December 2024, Gandhinagar, India
------------------------------
India is known for its linguistic diversity, featuring a multitude of
languages. The Constitution of India recognizes 22 languages under the
Eighth Schedule. These include Assamese, Bengali, Gujarati, Hindi, Kannada,
Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi,
Sanskrit, Sindhi, Tamil, Telugu, Urdu, Bodo, Santhali, Maithili, and Dogri.
Building a retrieval system that handles spoken queries in one of India's
22 officially recognized languages and locates relevant passages in a large
knowledge base is multifaceted and complex. To our knowledge, spoken-query
retrieval is a relatively underexplored area in information retrieval and
natural language processing, and it is a multi-lingual version that
includes under-resourced languages.
In addressing this challenge and exploring a new area, we offer a novel
shared task for FIRE 2024 that will allow the development and evaluation of
retrieval systems that receive a spoken query as input and search for
answers in a document corpus.
Overview of Task
Task 1: Spoken Query Ad-Hoc Retrieval Data - Monolingual Task
Participants are required to develop a Spoken Query Retrieval System that
handles monolingual queries. This task involves both the spoken queries and
the corpus being in the same language, making the retrieval process more
straightforward. The system should accurately interpret spoken queries and
retrieve relevant passages from a corpus in the same language. This year,
the languages involved in this task are English, Gujarati, Hindi, and
Bengali.
Task 2: Spoken Query Cross-Lingual Retrieval
Participants are required to develop a Spoken Query Retrieval System
capable of handling cross-lingual queries. In this task, the spoken queries
and the corpus are in different languages, adding complexity to the
retrieval process. The system should accurately interpret spoken queries in
one language and retrieve the most relevant passages from a corpus in
another language. This year, the task will involve English, Hindi, and
Bengali. The language pairs for queries and corpus could be any combination
of these languages, allowing participants to address various cross-lingual
retrieval challenges.
Tentative Timeline
20th August - Training Data Released and Registrations open
5th September - Test Data Release
30th September - Run Submission Deadline
10th October - Results Declared
20th October - Working notes due
20th November - Camera Ready Submissions due
12th-15th December - FIRE 2024 at Gandhinagar, India
Organizers
----------------
Bhargav Dave, DA-IICT, Gandhinagar, India
Debasis Ganguly, University of Glasgow, Scotland
Evangelos Kanoulas, University of Amsterdam
Prasenjit Majumder, DA-IICT, Gandhinagar, India
For regular updates, subscribe to our mailing list: sqclir@googlegroups.com