MedSAM3: Delving into Segment Anything with Medical Concepts

Anglin Liu, Rundong Xue, Xu R. Cao, Yifan Shen, Yi Lu, Xiang Li, Qianqian Chen, Jintai Chen

2025-11-26

MedSAM3: Delving into Segment Anything with Medical Concepts

Summary

This paper introduces MedSAM-3, a new computer program designed to automatically identify and outline structures within medical images like X-rays, MRIs, and CT scans, using simple text instructions.

What's the problem?

Currently, accurately identifying things in medical images requires either very specific programs built for one type of image or a lot of time and effort from doctors and experts to manually label everything. Existing methods don't work well when applied to new types of medical images or tasks, and creating those labels is slow and expensive.

What's the solution?

The researchers took a powerful image-analyzing program called SAM and improved it by training it on a large collection of medical images with descriptions of what's in them. This allows MedSAM-3 to understand text prompts like 'show me the lungs' and accurately highlight the lungs in an image. They also created a system where a computer program can 'think' through complex tasks and refine its results, making it even more accurate.

Why it matters?

This research is important because it could significantly speed up medical diagnoses and research. By automating the process of identifying structures in medical images, doctors can focus on interpreting the results rather than spending hours manually outlining everything, and researchers can analyze more data more quickly, potentially leading to new discoveries.

Abstract

Medical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application. Here, we propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmentation. By fine-tuning the Segment Anything Model (SAM) 3 architecture on medical images paired with semantic conceptual labels, our MedSAM-3 enables medical Promptable Concept Segmentation (PCS), allowing precise targeting of anatomical structures via open-vocabulary text descriptions rather than solely geometric prompts. We further introduce the MedSAM-3 Agent, a framework that integrates Multimodal Large Language Models (MLLMs) to perform complex reasoning and iterative refinement in an agent-in-the-loop workflow. Comprehensive experiments across diverse medical imaging modalities, including X-ray, MRI, Ultrasound, CT, and video, demonstrate that our approach significantly outperforms existing specialist and foundation models. We will release our code and model at https://github.com/Joey-S-Liu/MedSAM3.

View Paper