ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
Haofeng Liu, Mingqi Gao, Xuxiao Luo, Ziyue Wang, Guanyi Qin, Junde Wu, Yueming Jin
2025-05-16
Summary
This paper talks about ReSurgSAM2, a new system that helps computers better recognize and keep track of different parts in surgical videos, making it easier to follow what’s happening during surgery.
What's the problem?
The problem is that in surgery videos, it's really hard for AI to accurately identify and track all the important areas or tools over time, especially when things move around or look similar, which can make it tough for doctors to rely on these systems.
What's the solution?
The researchers built a two-step process that first uses an advanced model to spot and separate different parts in each video frame, and then uses a smart memory system to remember and track these parts as they move throughout the surgery. This approach combines several new AI techniques to make the tracking much more reliable.
Why it matters?
This matters because it can help doctors and medical teams get a clearer, more accurate view of surgeries, making procedures safer and possibly leading to better training, planning, and patient outcomes.
Abstract
ReSurgSAM2 is a two-stage framework that improves surgical scene segmentation and tracking using Segment Anything Model 2, cross-modal spatial-temporal Mamba, and a diversity-driven memory mechanism.