CALAMITA - Challenge the Abilities of LAnguage Models in ITAlian

Special event co-located with the Tenth Italian Conference on Computational Linguistics - CLiC-it 2024 Pisa, 4 - 6 December, 2024 - https://clic2024.ilc.cnr.it/

Upcoming deadline: 17th May 2024, challenge pre-proposal submission!
Pre-proposal form: https://forms.gle/u4rSt9yXHHYquKrB6

Project Description

AILC, the Italian Association for Computational Linguistics, is launching a collaborative effort to develop a dynamic and growing benchmark for evaluating LLMs’ capabilities in Italian.

In the long term, we aim to establish a suite of tasks in the form of a benchmark which can be accessed through a shared platform and a live leaderboard. This would allow for ongoing evaluation of existing and newly developed Italian or multilingual LLMs.

In the short term, we are looking to start building this benchmark through a series of challenges collaboratively construed by the research community. Concretely, this happens through the present call for challenge contributions. In a similar style to standard Natural Language Processing shared tasks, participants are asked to contribute a task and the corresponding dataset with which a set of LLMs should be challenged. Participants are expected to provide an explanation and motivation for a given task, a dataset that reflects that task together with any information relevant to the dataset (provenance, annotation, distribution of labels or phenomena, etc.) and a rationale for putting that together that way. Evaluation metrics and example prompts should also be provided. Existing relevant datasets are also very welcome, together with related publications if available. All of the proposed challenges either with existing datasets or new datasets, will have to follow the challenge template, which will be distributed in due time, towards the write-up of a challenge paper.

In this first phase, all prospective participants are asked to submit a pre-proposal by filling in this form https://forms.gle/u4rSt9yXHHYquKrB6. Please fill in all the fields so we can get an idea of what challenge you’d like to propose, how the model should be prompted to perform the task, where you’d get the data and how much, whether it’s already available, etc.

The organizers will examine the submitted pre-proposals and select those challenges that comply with the template’s requirements, with an eye to balancing different challenge types. The selected challenges will be expanded with a full dataset, longer descriptions, etc. according to the aforementioned template which will be distributed later. The final report of each accepted challenge must provide the code for the evaluation with an example that must smoothly run on a pre-selected base LLM (most likely LLaMa-2) which will be communicated by the organisers in the second phase. All reports will be published as CEUR Proceedings related to the CALAMITA event. Subsequently, all challenge organisers who wish to be involved can participate in a broader follow-up paper, targeting a top venue, which will describe the whole benchmark, procedures, results, and analyses.

Once this first challenge set is put together, the CALAMITA organizers will run zero or few shots experiments with a selection of LLMs, and write a final report. No tuning materials or experiments are expected at this stage of the project.

Deadlines (tentative)

17th May 2024: pre-proposal submission
27th May 2024: notification of pre-proposal acceptance
End of May 2024: distribution of challenge paper template and further instructions
2nd September 2024: data and report submission
30th September 2024: benchmark ready with reports for each challenge (after light review)
October-November 2024: running selected models on the benchmark with analyses
4th-6th December 2024: CLIC-it Pisa (special event co-located with CLIC-it 2024)

Website: https://clic2024.ilc.cnr.it/calamita (under construction)

Mail: calamita.ailc@gmail.com

Organizers

Pierpaolo Basile (University of Bari Aldo Moro)
Danilo Croce (University of Rome, Tor Vergata)
Malvina Nissim (University of Groningen)
Viviana Patti (University of Turin)