CALAMITA - Challenge the Abilities of LAnguage Models in ITAlian
Special event co-located with the Tenth Italian Conference on
Computational Linguistics - CLiC-it 2024 Pisa, 4 - 6 December,
2024 - https://clic2024.ilc.cnr.it/
Upcoming deadline: 17th May 2024, challenge pre-proposal submission!
Pre-proposal form: https://forms.gle/u4rSt9yXHHYquKrB6
Project Description
AILC, the Italian Association for Computational Linguistics, is
launching a collaborative effort to develop a dynamic and
growing benchmark for evaluating LLMs’ capabilities in Italian.
In the long term, we aim to establish a suite of
tasks in the form of a benchmark which can be accessed through a
shared platform and a live leaderboard. This would allow for
ongoing evaluation of existing and newly developed Italian or
multilingual LLMs.
In the short term, we are looking to start building
this benchmark through a series of challenges collaboratively
construed by the research community. Concretely, this happens
through the present call for challenge contributions. In a similar
style to standard Natural Language Processing shared tasks, participants
are asked to contribute a task and the corresponding dataset
with which a set of LLMs should be challenged. Participants
are expected to provide an explanation and motivation for a given
task, a dataset that reflects that task together with any
information relevant to the dataset (provenance, annotation,
distribution of labels or phenomena, etc.) and a rationale for
putting that together that way. Evaluation metrics and example
prompts should also be provided. Existing relevant datasets are
also very welcome, together with related publications if
available. All of the proposed challenges either with existing
datasets or new datasets, will have to follow the challenge
template, which will be distributed in due time, towards the
write-up of a challenge paper.
In this first phase, all prospective participants are asked to
submit a pre-proposal by filling in this form
https://forms.gle/u4rSt9yXHHYquKrB6. Please fill in all the fields
so we can get an idea of what challenge you’d like to propose, how
the model should be prompted to perform the task, where you’d get
the data and how much, whether it’s already available, etc.
The organizers will examine the submitted pre-proposals and select
those challenges that comply with the template’s requirements,
with an eye to balancing different challenge types. The selected
challenges will be expanded with a full dataset, longer
descriptions, etc. according to the aforementioned template which
will be distributed later. The final report of each accepted
challenge must provide the code for the evaluation with an example
that must smoothly run on a pre-selected base LLM (most likely
LLaMa-2) which will be communicated by the organisers in the
second phase. All reports will be published as CEUR Proceedings
related to the CALAMITA event. Subsequently, all challenge
organisers who wish to be involved can participate in a broader
follow-up paper, targeting a top venue, which will describe the
whole benchmark, procedures, results, and analyses.
Once this first challenge set is put together, the CALAMITA
organizers will run zero or few
shots experiments with a selection of LLMs, and write a final
report. No tuning materials or experiments are expected at this
stage of the project.
Deadlines (tentative)
Website: https://clic2024.ilc.cnr.it/calamita (under construction)
Mail: calamita.ailc@gmail.com
Organizers