<Apologies for cross-postings> ------------------------------------------------ Call for Participation PROFE 2025: Language Proficiency Evaluation IberLEF 2025 Shared Task
https://nlp.uned.es/question-answering/profe2025
PROFE 2025 reuses the exams for Spanish proficiency evaluation developed by Instituto Cervantes over many years to evaluate human students. Therefore, automatic systems will be evaluated under the same conditions as humans were. Systems will receive a set of exercises with their corresponding instructions without specific training material. In this way we expect Transfer Learning approaches or the use of Generative Large Language Models.
Subtasks PROFE 2025 has three subtasks, one per exercise type. Teams can participate in any combination of them. Each subtask contains several exercises of the same type. The subtasks are:
1. Multiple choice subtask: each exercise includes a text and a set of multiple-choice questions about the text where only one answer is correct. Given a multiple-choice question, systems must select the correct answer among the candidates. 2. Matching subtask: each exercise contains two sets of texts. Systems must find the text in the second set that best matches the first set. There is only one possible matching per text, but the first set can contain extra unnecessary texts. 3. Filling the gap subtask: each exercise contains a text with several gaps corresponding to textual fragments that have been removed and presented disorderly as options. Systems must determine the correct position for each fragment. There is only one correct text per gap, but there could be more candidates than gaps.
The different exercises open research on how to approach them, adapting different prompts when using generative models.
Dataset We will use the IC-UNED-RC-ES dataset created from real examinations at Instituto Cervantes. These exams were created by human experts to assess language proficiency in Spanish. We have already collected the exams and converted them to a digital format, which is ready to be used in the task. The dataset contains exams at different levels (from A1 to C2).
The complete dataset contains 282 exams with 855 exercises. The total number of evaluation points are 6146 (among 16570 options) distributed by exercise type as: multiple-choice: 3544 responses matching: 2309 responses fill-the-gap: 293 responses
In PROFE 2025 we plan to use around 50% of the exams, so the other 50% remains hidden for PROFE second edition.
We intend not to distribute the gold standard to prevent overfitting in post-campaign experiments and data contamination in LLMs.
Evaluation measures and baseline We will use traditional accuracy (proportion of correct answers) as the main evaluation measure. Systems will receive evaluation scores from two different perspectives:
* At the question level, where correct answers are counted individually without grouping them. * At the exam level, where scores for each exam are considered. Each exam contains several exercises of different types. An exam is considered to be passed if an accuracy score (accounted as the proportion of correct answers) above 0.5 is reached. Then, the proportion of passed exams is given as a global score. This perspective will only apply to those teams participating in the three subtasks.
More in detail, the exact evaluation per subtask is as follows:
* Multiple choice subtask: we will measure accuracy as the proportion of questions correctly answered * Matching subtask: we will measure accuracy as the proportion of correct texts matched. * Fill in the gap subtask: We will measure accuracy as the proportion of correctly filled gaps.
We will use accuracy as the evaluation measure because there is only one correct option among candidates and because it is the measure applied to humans doing the same exams. Thus, we can compare the performance of automatic systems and humans under the same conditions
A preliminary baseline using ChatGPT obtains the following results for each exercise type (provided that different prompting can produce slightly different results):
* Multiple choice accuracy: 0.64 * Filling the gap accuracy: 0.43 * Matching accuracy: 0.51
Schedule February 6, 2025 Registration opens March 10, 2025 Training data released April 28, 2025 Test set release May 9, 2025 Deadline for submitting runs May 14, 2025 Release of evaluation results June 3, 2025 Paper submission deadline
Organizers Alvaro Rodrigohttps://www.uned.es/universidad/docentes/informatica/alvaro-rodrigo-yuste.html, UNED NLP & IR Group (Universidad Nacional de Educación a Distancia) Anselmo Peñashttps://www.uned.es/universidad/docentes/informatica/anselmo-penas-padilla.html, UNED NLP & IR Group (Universidad Nacional de Educación a Distancia) Alberto Pérezhttps://www.uned.es/universidad/docentes/informatica/alberto-perez-garcia-plaza.html, UNED NLP & IR Group (Universidad Nacional de Educación a Distancia) Sergio Morenohttps://www.uned.es/universidad/docentes/en/informatica/sergio-moreno-alvarez.html, UNED NLP & IR Group (Universidad Nacional de Educación a Distancia) Javier Fruns, Instituto Cervantes Inés Soria, Instituto Cervantes Rodrigo Agerrihttps://ragerri.github.io/, HiTz (Universidad del País Vasco, UPV/EHU)
AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente. Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datoshttps://www.uned.es/dpj, o a través de la Sede electrónicahttps://sede.uned.es/ de la Universidad. Para más información visite nuestra Política de Privacidadhttps://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf.