(Apologies for cross-posting)
Conversational agents offer promising opportunities for education as they can fulfill various roles (e.g., intelligent tutors and service-oriented assistants) and pursue different objectives (e.g., improving student skills and increasing instructional efficiency), among which serving as an AI tutor is one of the most prevalent tasks. Recent advances in the development of Large Language Models (LLMs) provide our field with promising ways of building AI-based conversational tutors, which can generate human-sounding dialogues on the fly. The key question posed in previous research, however, remains: How can we test whether state-of-the-art generative models are good AI teachers, capable of replying to a student in an educational dialogue?
In this shared task, we focus on educational dialogues between a student and a tutor in the mathematical domain grounded in student mistakes or confusion, where the AI tutor aims to remediate such mistakes or confusions, with the goal of evaluating the quality of tutor responses along the key dimensions of tutor’s ability to (1) identify student’s mistake, (2) point to its location, (3) provide the student with relevant pedagogical guidance, that is also (4) actionable. Dialogues used in this shared task include the dialogue contexts from MathDial (Macina et al., 2023) and Bridge (Wang et al., 2024) datasets, including the last utterance from the student containing a mistake, and a set of responses to the last student’s utterance from a range of LLM-based tutors and, where available, human tutors, aimed at mistake remediation and annotated for their quality.
Data Release We are pleased to announce that the test data is now released and can be accessed at https://github.com/kaushal0494/UnifyingAITutorEvaluation/blob/main/BEA_Share....
Test Platform The competition is hosted on the CodaBench (https://www.codabench.orghttps://www.codabench.org/) platform, with a separate page for each track.
Track 1 – Mistake Identification: https://www.codabench.org/competitions/7195/ Track 2 – Mistake Location: https://www.codabench.org/competitions/7200/ Track 3 – Providing Guidance: https://www.codabench.org/competitions/7202/ Track 4 – Actionability: https://www.codabench.org/competitions/7203/ Track 5 – Tutor Identification: https://www.codabench.org/competitions/7206/
Registered teams are welcome to participate in any number of tracks.
Participation In order to participate in the test phase, you will need to create an account on CodaBenchhttps://www.codabench.org/, if you don't already have one. After that, please register for the specific track(s) you wish to submit your systems' predictions to. By participating in this shared task, you are agreeing to the Terms outlined on the shared task track webpages (see tab "Terms").
The total number of submissions per each team is capped at 5 for each track (with the maximum of 2 submissions per day). The platform will ask you to provide your team name and a title for each submission – the latter may be useful to distinguish between your different submissions. All submissions will then be reflected on the CodaBench platform together with all the accompanying information (team name, affiliation, and submission name). Please note that we will publish the official final leaderboard on the shared task website (https://sig-edu.org/sharedtask/2025), where only the first 5 submissions per team will be included to adhere with the terms of this shared task.
To be added to the shared task mailing list for further updates, please register here: https://forms.gle/fKJcdvL2kCrPcu8X6
Important dates
All deadlines are 11:59pm UTC-12 (anywhere on Earth).
- March 12, 2025: Development data release - April 10, 2025: Test data release - April 24, 2025: System submissions from teams due - April 30, 2025: Evaluation of the results by the organizers - May 21, 2025: System papers due - May 28, 2025: Paper reviews returned - June 9, 2025: Final camera-ready submissions - July 31 and August 1, 2025: BEA 2025 workshop at ACL
Contact: bea.sharedtask.2025@gmail.commailto:bea.sharedtask.2025@gmail.com Shared task website: https://sig-edu.org/sharedtask/2025