MedGemma Technical Report
Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson
2025-07-09
Summary
This paper talks about MedGemma, a collection of advanced AI models that combine understanding of medical images and text to help with tasks like interpreting scans and generating medical reports. These models are built on the Gemma 3 architecture and have shown strong performance across many medical challenges.
What's the problem?
The problem is that medical data is complex because it includes both images, like X-rays and pathology slides, and detailed text reports. Many AI models struggle to combine these different types of information effectively while maintaining accuracy and reasoning ability.
What's the solution?
The researchers developed MedGemma with multiple versions specialized for 2D images, 3D scans, and genetic data. They trained these models on huge datasets with millions of medical image-text pairs and fine-tuned them to perform specific tasks like visual question answering and report generation. They also introduced a vision encoder called MedSigLIP optimized for high-resolution medical images.
Why it matters?
This matters because MedGemma can assist doctors and medical researchers by accurately interpreting complex medical data and supporting diagnostic workflows. It pushes AI closer to practical, reliable use in healthcare for improving patient care and medical research.
Abstract
MedGemma, a collection of medical vision-language foundation models, demonstrates advanced medical understanding and reasoning, outperforming similar-sized generative models and approaching task-specific models' performance.