Shared Task Website: https://brandonio-c.github.io/ClinIQLink-2025/
Dear Colleague,
We are pleased to invite you to participate in ClinIQLink 2025, an evaluation task organized as part of the BioNLP Workshop at ACL 2025. This initiative focuses on assessing the ability of generative models to produce factually accurate medical information, particularly in the context of knowledge retrieval and hallucination detection.
About the Task
The ClinIQLink challenge evaluates models using a novel dataset of atomic, fact-based question-answer pairs aligned with the knowledge level of a General Practitioner (GP). Submissions will be assessed on:
* Knowledge Retrieval: How accurately models retrieve medical information about core concepts like procedures, conditions, drugs, and diagnostics. * Hallucination Analysis (Post-hoc): Understanding hallucination origins in model responses, categorized into intrinsic (internal model issues), extrinsic (external information gaps), or hybrid causes.
Models will be scored based on precision, with penalties for incorrect or unsupported answers. Although hallucination analysis won’t affect the leaderboard, findings will highlight areas for improvement.
Participation Requirements
To take part in this shared task, participants must:
* Submit their models to CodaBench for evaluation. * Provide a short paper describing the methodology, including any novel approaches or improvements made.
The dataset, created in collaboration with medical experts, will not be publicly released to ensure the evaluation's integrity.
Evaluation Details
Submissions will be evaluated using a semi-automated process with metrics for both closed-ended and open-ended questions:
* Closed-ended Questions: True/False, multiple-choice, and lists, scored using precision, recall, and F1 metrics. * Open-ended Questions: Evaluated on exact matches or partial semantic similarity using semantic similarity scores (described on the shared task website) and, where necessary, analyzed by experts with utilizing semantic similarity scores, BLEU, ROUGE, METEOR, and other metrics to assist with the experts judgements.
Important Dates
* First Call for Participation: January 21, 2025 * Dataset and testing framework release on Codabench: February 20, 2025 * System submission Deadline: April 15, 2025 * Results Feedback: April 25, 2025 * Preliminary Paper Submission: May 5, 2025 * Final Paper Submission: May 15, 2025 * BioNLP Workshop at ACL 2025: July 31, 2025
For a full timeline and additional details, visit our official websitehttps://brandonio-c.github.io/ClinIQLink-2025/.
Why Participate?
This task offers a unique opportunity to benchmark your models against state-of-the-art systems, advance the field of medical QA, and contribute to a deeper understanding of hallucination detection in generative AI.
If you have any questions, please do not hesitate to contact Brandon Colelough at brandon.colelough@nih.govmailto:brandon.colelough@nih.gov.
We look forward to your participation in this exciting initiative.
Kind regards,
Brandon Colelough (He / Him)
[News, Events, and Updates]NIH Fellow | Fulbright Scholar | ADF Signals Officer | Electrical Engineer National Institutes of Health – National Library of Medicine (LHC) M: +61 481 269 667tel:+61481269667 (AUS) | M: +1 (202) 367-7230tel:+12023677230 (US) E: brandcol@umd.edumailto:brandcol@umd.edu | E: brandon.colelough@gmail.commailto:brandon.colelough@gmail.com L: www.linkedin.com/in/brandon-coleloughhttps://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fbrandon-colelough-853296194&data=05%7C02%7Cbrandon.colelough%40nih.gov%7C72376e92177e433f45aa08dc9ac6d3b2%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638555425971995708%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=1I95BjjvXOCu5VFz6HyQNgbJV6RQZk9or4KY0ZjpKqw%3D&reserved=0