CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
Xi Zhang, Zaiqiao Meng, Jake Lever, Edmond S. L. Ho
2025-10-08
Summary
This paper focuses on improving the accuracy of AI models used to read medical images, specifically X-rays. These models, called Multimodal Large Language Models, are good at combining what they 'see' in the image with medical knowledge, but they sometimes make up details that aren't actually there – a problem called 'medical hallucination'.
What's the problem?
The main issue is that these AI models, even though they're advanced, frequently generate descriptions of X-rays that aren't supported by the actual image. This happens because the models are overly influenced by the way the questions, or 'prompts', are worded, and they tend to focus too much on general clinical information rather than the specific details in the X-ray itself. Incorrect information in a medical context can be dangerous, so this is a serious concern.
What's the solution?
The researchers developed a new technique called Clinical Contrastive Cecoding, or CCD. It's a clever way to improve the AI's accuracy *without* needing to retrain the entire model. CCD works by using information from specialized radiology tools – tools that doctors already trust – to guide the AI's responses. It essentially checks the AI's output against what a radiology expert would say, and subtly adjusts the AI's wording to make it more clinically accurate. This happens in two steps, refining the AI’s choices word by word.
Why it matters?
This research is important because it offers a practical and efficient way to make AI-powered radiology tools more reliable. By reducing medical hallucinations, it helps ensure that doctors can trust the AI's interpretations of X-rays, leading to better diagnoses and patient care. The method is also flexible and can be applied to different AI models and datasets, making it a broadly useful solution for improving medical AI.
Abstract
Multimodal large language models (MLLMs) have recently achieved remarkable progress in radiology by integrating visual perception with natural language understanding. However, they often generate clinically unsupported descriptions, known as medical hallucinations, which pose serious risks in medical applications that demand accuracy and image-grounded outputs. Through empirical analysis, we find that prompt-induced hallucinations remain prevalent in radiology MLLMs, largely due to over-sensitivity to clinical sections. To address this, we introduce Clinical Contrastive Cecoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task-specific radiology expert models. CCD introduces a dual-stage contrastive mechanism to refine token-level logits during generation, thereby enhancing clinical fidelity without modifying the base MLLM. Experiments on three datasets and multiple models demonstrate that CCD consistently improves overall performance on radiology report generation (RRG). On the MIMIC-CXR dataset, it yields up to a 17% improvement in RadGraph-F1 when applied to state-of-the-art RRG models. Our approach provides a lightweight and generalisable solution for mitigating medical hallucinations, effectively bridging expert models and MLLMs in radiology.