Unifying Segment Anything in Microscopy with Multimodal Large Language Model

Manyu Li, Ruian He, Zixian Zhang, Weimin Tan, Bo Yan

2025-05-19

Unifying Segment Anything in Microscopy with Multimodal Large Language
Model

Summary

This paper talks about improving a computer tool called SAM, which is used to pick out specific parts of microscope images, by combining it with a powerful language and vision model.

What's the problem?

The problem is that SAM, while good at finding and separating objects in regular images, doesn't always work well with the complicated and varied images that come from microscopes, especially when the images are very different from what it has seen before.

What's the solution?

To solve this, the researchers connected SAM with a multimodal large language model, which means the tool can now use both visual and language information. This helps SAM better understand what to look for in all kinds of microscope images, even ones it hasn't seen before.

Why it matters?

This matters because being able to accurately identify parts of microscope images can help scientists make discoveries in biology and medicine, and the improved tool can work well on many different types of images, making research faster and more reliable.

Abstract

Injecting Vision-Language Knowledge using MLLMs enhances SAM for better performance and generalization across in-domain and out-of-domain microscopy datasets.

View Paper