TL;DR SHROOM-visionshttps://helsinki-nlp.github.io/shroom/2026 is a shared task to advance model-agnostic evaluation of hallucination detection for Vision-and-Language Models (VLMs). Participate in detecting fine-grained hallucination spans across 4 languages (Chinese, English, French, Italian). Stay informed by joining our Google grouphttps://groups.google.com/g/shroom-visions/ or our Slackhttps://join.slack.com/t/shroom-shared-task/shared_invite/zt-2mmn4i8h2-HvRBdK5f4550YHydj5lpnA!
Full Invitation We are excited to announce the SHROOM-visions shared task on vision-language hallucination detection (link to websitehttps://helsinki-nlp.github.io/shroom/2026). We invite participants to detect and classify hallucination spans in a multilingual, multimodal context, using a dataset designed for enduring evaluation.
About As new foundational models emerge monthly, how do we create hallucination evaluations that remain relevant? Current benchmarks are often tied to the idiosyncrasies of specific LLMs/VLMs, risking quick obsolescence. This shared task builds upon the *SHROOM https://helsinki-nlp.github.io/shroom/ series of hallucination detection tasks and datasets, venturing into vision-language multilingual hallucination-span prediction. With this shared-task we aim to advance detection methods that generalize across model generations and focus on the core phenomenon of hallucination.
We provide a dataset of 20,000 samples annotated with a fine-grained, span-level labeling scheme:
* A train set of ~15,200 samples from 5 different LVLMs. * A closed test set of 4,800 crafted samples. * A submission platformhttps://shroom.pythonanywhere.com/ to evaluate the performance of your systems. * Balanced coverage across 4 languages: Chinese, English, French, Italian. * Each sample annotated by 3 annotators using a four-class taxonomy: Invention, Mischaracterization, OCR Problem, Miscounting.
Participants are invited to develop systems that accurately identify and classify hallucinated text spans in image-conditioned outputs. Participants will be invited to submit system description papers, with the option to present them at the UncertaiNLP workshophttps://uncertainlp.github.io/ (co-located with EMNLP 2026). All authors of paper submissions will be asked to review peers' submissions (max 2 papers per author).
Key Dates: All deadlines are “anywhere on Earth” (23:59 UTC-12).
* Train set available by: 10.05.2026 * Submission platform open by: 20.05.2026 * Evaluation phase ends: 31.07.2026 * System description papers due: 10.08.2026 (TBC) * Notification of acceptance: 10.09.2026 (TBC) * Camera-ready due: 20.09.2026 (TBC) * UncertaiNLP workshop: end of October 2026 (co-located with EMNLP)
Evaluation Metrics: Participants’ models must be able to produce spans corresponding to hallucinations in the text, classified along five possible categories (invention, mischaracterization, OCR problems, miscounting, and other hallucinations). The evaluation will rely on two metrics, evaluating labelled and unlabelled performances separately. Rankings and submissions will be handled separately per language: you are welcome to focus on the languages of your choice!
How to Participate:
* Register: Please register your team before making a submission on https://shroom.pythonanywhere.com * Submit results: use our platform to submit your results before 31.07.2026 * Submit your system description: system description papers should be submitted by 10.08.2026 (TBC, further details will be announced at a later date).
Want to be kept in the loop? Join our Google group mailing listhttps://groups.google.com/g/shroom-visions/ or the shared task Slackhttps://join.slack.com/t/shroom-shared-task/shared_invite/zt-2mmn4i8h2-HvRBdK5f4550YHydj5lpnA! We are also open to hosting Q&A sessions for groups interested in participating, you just need to send us an email. We look forward to your participation and to the exciting research that will emerge from this task.
Best regards, Raúl Vázquez and Timothee Mickus On behalf of ALL the SHROOM-Visions organizers