MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Jiazhen Pan, Che Liu, Junde Wu, Fenglin Liu, Jiayuan Zhu, Hongwei Bran Li, Chen Chen, Cheng Ouyang, Daniel Rueckert

2025-02-28

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language
Models (VLMs) via Reinforcement Learning

Summary

This paper talks about a new AI system called MedVLM-R1 that can analyze medical images like MRIs and X-rays while explaining its thinking process, making it more trustworthy for doctors and regulators.

What's the problem?

Current AI systems for medical image analysis can give answers but can't explain how they reached those conclusions. This lack of transparency makes it hard for doctors to trust the AI and for regulators to approve its use in hospitals.

What's the solution?

The researchers created MedVLM-R1, an AI that not only analyzes medical images but also explains its reasoning in plain language. They used a special training method called reinforcement learning, which encourages the AI to figure out how to explain its thinking in a way humans can understand. This approach worked better than traditional training methods, even with less data and a smaller AI model.

Why it matters?

This matters because it could help doctors trust and use AI more in diagnosing patients. By explaining its reasoning, the AI can help doctors make better decisions and potentially catch things they might miss. It also makes it easier for regulators to approve the use of AI in hospitals, which could speed up diagnoses and improve patient care. The fact that it works well even with limited training data means it could be more easily adapted for different medical specialties or hospitals.

Abstract

Reasoning is a critical frontier for advancing medical image analysis, where transparency and trustworthiness play a central role in both clinician trust and regulatory approval. Although Medical Visual Language Models (VLMs) show promise for radiological tasks, most existing VLMs merely produce final answers without revealing the underlying reasoning. To address this gap, we introduce MedVLM-R1, a medical VLM that explicitly generates natural language reasoning to enhance transparency and trustworthiness. Instead of relying on supervised fine-tuning (SFT), which often suffers from overfitting to training distributions and fails to foster genuine reasoning, MedVLM-R1 employs a reinforcement learning framework that incentivizes the model to discover human-interpretable reasoning paths without using any reasoning references. Despite limited training data (600 visual question answering samples) and model parameters (2B), MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI, CT, and X-ray benchmarks, outperforming larger models trained on over a million samples. It also demonstrates robust domain generalization under out-of-distribution tasks. By unifying medical image analysis with explicit reasoning, MedVLM-R1 marks a pivotal step toward trustworthy and interpretable AI in clinical practice.

View Paper