CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Xinlei Yu, Chanmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

2025-07-04

CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for
Multi-Organ Segmentation

Summary

This paper talks about CRISP-SAM2, a new AI model designed to improve medical imaging by better identifying and segmenting multiple organs in 3D images. It combines visual information with textual descriptions about organs to enhance accuracy and reduce the need for traditional geometric prompts.

What's the problem?

The problem is that current medical image segmentation models often produce blurry or inaccurate boundaries, rely too much on geometric prompts that can be hard to provide, and lose important spatial details when working with 3D images.

What's the solution?

The researchers developed CRISP-SAM2 using a cross-modal interaction technique that merges visual and text data, creating richer semantic understanding. This allows the model to use semantic prompts based on textual descriptions instead of geometric prompts. They also improved memory and mask refining strategies to better handle 3D spatial information and capture detailed organ boundaries.

Why it matters?

This matters because more precise and flexible segmentation of organs in medical images helps doctors diagnose and treat diseases more effectively. It also reduces the effort needed to prepare data for AI models, making medical imaging analysis faster and more reliable.

Abstract

CRISP-SAM2, a novel model using cross-modal interaction and semantic prompting, enhances multi-organ medical segmentation by improving detail accuracy and reducing dependence on geometric prompts.

View Paper