We’d like to invite you to take part in a machine learning / natural language competition with modest (~$500) prizes.
We’re evaluating language models not just on their accuracy at answering questions but also on how well the models communicate their uncertainty not just quantitatively but also qualitatively.
The models will be evaluated on accuracy, but this is not the primary metric. The primary metric will be how much the models help users do a better job of answering questions. In other words, a model that has 75% accuracy but is convincingly wrong on the remaining 25% of questions will fare worse than a model that has 66% accuracy but can correctly identify the remaining questions and say “I don’t know”, since the downstream humans will trust the latter model more.
The human–computer games will be filmed and posted to YouTube so you can see how players reacted to your models.
The system submission process is designed to be beginner-friendly and intuitive. There’s a version that is prompt-based, but also a huggingface upload for complete models. So if you’ve just wrapped up teaching an introductory NLP / AI course, we’d appreciate it if you pass this along!
Full information here: https://sites.google.com/view/qanta/2025-competition
Please contact qanta@googlegroups.com if you have any questions!
Best,
Jordan