GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning

Bo Liu, Xiangyu Zhao, Along He, Yidi Chen, Huazhu Fu, Xiao-Ming Wu

2025-06-24

GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via
Reinforcement Learning

Summary

This paper talks about GEMeX-ThinkVG, a new method that improves medical visual question answering (VQA) models by making them better at understanding questions about medical images and explaining their answers clearly.

What's the problem?

The problem is that existing medical VQA models often struggle with providing reliable, understandable answers to complex questions about medical images, which limits their usefulness for doctors and healthcare professionals.

What's the solution?

The researchers created a new dataset designed for medical VQA and developed a reward system that verifies the quality of the model’s answers during training using reinforcement learning, which helps the model learn to give more explainable and accurate responses.

Why it matters?

This matters because it makes AI tools more trustworthy and effective in assisting medical experts by providing clear explanations along with answers, helping improve diagnosis and patient care.

Abstract

A novel dataset and verifiable reward mechanism enhance the explainability and efficiency of medical visual question answering models.

View Paper