MediX-R1: Open Ended Medical Reinforcement Learning

Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Anwer, Hisham Cholakkal

2026-02-27

MediX-R1: Open Ended Medical Reinforcement Learning

Summary

This paper introduces MediX-R1, a new system designed to make medical AI models, specifically those that can understand both text and images, better at giving detailed and accurate answers to medical questions, going beyond simple multiple-choice tests.

What's the problem?

Current medical AI models often struggle with open-ended questions requiring complex reasoning. They're typically trained to pick from pre-defined answers, limiting their ability to provide thorough explanations or handle nuanced situations. Existing methods for improving these models, like simple reward systems, aren't effective for complex, free-form responses because it's hard to automatically check if the answer is truly correct and makes sense medically.

What's the solution?

The researchers developed MediX-R1, which uses a technique called Reinforcement Learning to fine-tune an existing AI model. This means the model learns by receiving feedback on its answers. Crucially, this feedback isn't just a simple 'right' or 'wrong'. Instead, it's a combination of rewards: one checks if the answer is semantically correct using another AI model, another looks for medical terminology and similar phrasing, and others ensure the answer is well-formatted and considers both text and image information when appropriate. They also created a new way to evaluate the model's performance, using an AI judge to assess the quality of the answers instead of just comparing them to exact text matches.

Why it matters?

This work is important because it shows a promising path towards building more reliable and helpful medical AI assistants. By allowing models to generate detailed, open-ended answers and using sophisticated evaluation methods, we can move closer to AI that can truly assist doctors and patients with complex medical reasoning and decision-making.

Abstract

We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward that judges semantic correctness with a strict YES/NO decision, a medical embedding-based semantic reward to capture paraphrases and terminology variants, and lightweight format and modality rewards that enforce interpretable reasoning and modality recognition. This multi-signal design provides stable, informative feedback for open-ended outputs where traditional verifiable or MCQ-only rewards fall short. To measure progress, we propose a unified evaluation framework for both text-only and image+text tasks that uses a Reference-based LLM-as-judge in place of brittle string-overlap metrics, capturing semantic correctness, reasoning, and contextual alignment. Despite using only sim51K instruction examples, MediX-R1 achieves excellent results across standard medical LLM (text-only) and VLM (image + text) benchmarks, outperforming strong open-source baselines and delivering particularly large gains on open-ended clinical tasks. Our results demonstrate that open-ended RL with comprehensive reward signals and LLM-based evaluation is a practical path toward reliable medical reasoning in multimodal models. Our trained models, curated datasets and source code are available at https://medix.cvmbzuai.com

View Paper