Call for Participation

BEA 2023 Shared Task: Generating AI Teacher Responses in Educational Dialogues

SHARED TASK DESCRIPTION

Conversational agents offer promising opportunities for education. They can fulfill various roles (e.g., intelligent tutors and service-oriented assistants) and pursue different objectives (e.g., improving student skills and increasing instructional efficiency) (Wollny et al. 2021). Among all of these different vocations of an educational chatbot, the most prevalent one is the AI teacher helping a student with skill improvement and providing more opportunities to practice. Some recent meta-analyses have even reported a significant effect of chatbots on skill improvement, for example in language learning (Bibauw et al. 2022). What is more, current advances in AI and natural language processing have led to the development of conversational agents that are founded on more powerful generative language models.

Despite these promising opportunities, the use of powerful generative models as a foundation for downstream tasks also presents several crucial challenges. In the educational domain in particular, it is important to ascertain whether that foundation is solid or flimsy. Bommasani et al. (2021: pp. 67-72) stressed that, if we want to put these models into practice as AI teachers, it is imperative to determine whether they can (a) speak to students like a teacher, (b) understand students, and (c) help students improve their understanding. Therefore, Tack and Piech (2022) formulated the AI teacher test challenge: How can we test whether state-of-the-art generative models are good AI teachers, capable of replying to a student in an educational dialogue?

Following the AI teacher test challenge, we organize a first shared task on the generation of teacher language in educational dialogues. The goal of the task is to use NLP and AI methods to generate teacher responses in real-world samples of teacher-student interactions. These samples are taken from the Teacher Student Chatroom Corpus (Caines et al. 2020; Caines et al. 2022). Each training sample is composed of a dialogue context (i.e., several teacher-student utterances) as well as the teacher’s response. For each test sample, participants are asked to submit their best generated teacher response.

The purpose of the task is to benchmark the ability of generative models to act as AI teachers, replying to a student in a teacher-student dialogue. Submissions will be ranked according to several automated dialogue evaluation metrics, with the top submissions selected for further human evaluation. During this manual evaluation, human raters will compare a pair of teacher responses in terms of three abilities: can speak like a teacher, can understand a student, can help a student (Tack & Piech 2022). As such, we adopt an evaluation method that is akin to ACUTE-Eval for evaluating dialogue systems (Li et al. 2019).

PARTICIPATION

The shared task is hosted on CodaLab (Pavao et al. 2022). Anyone participating in the shared task will be asked to:

1. Register on the CodaLab platform.
2. Fill in the registration form with their CodaLab ID. Participants must comply with the terms and conditions of the task and the TSCC data outlined in the form.
3. Register for the CodaLab competition using the CodaLab ID. We will only accept people who submitted the registration form. Note that you can participate as a member of one team only.

IMPORTANT DATES

Fri Mar 24, 2023 Training data release
Mon May 1, 2023 Test data release
Fri May 5, 2023 Final submissions due
Mon May 8, 2023 Results announced
Fri May 12, 2023 Human evaluation results announced
Mon May 22, 2023 System papers due
Fri May 26, 2023 Paper reviews returned
Tue May 30, 2023 Camera-ready papers due
Mon June 12, 2023 Pre-recorded video due
July 13, 2023 BEA Workshop at ACL

ORGANIZERS

Anaïs Tack, KU Leuven; Ekaterina Kochmar, MBZUAI; Zheng Yuan, King’s College London; Serge Bibauw, Universidad Central del Ecuador; Chris Piech, Stanford University

Webpage: https://sig-edu.org/sharedtask/2023