Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye, Guoan Wang, Tianbin Li, Haoyu Wang, Ruoyu Li, He Yao, Junren Chen, JingWen Li, Yanzhou Su, Min Zhu, Junjun He

2024-11-26

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Summary

This paper introduces the IMed-361M dataset and baseline model for Interactive Medical Image Segmentation (IMIS), which aims to improve the accuracy and efficiency of segmenting medical images by providing a large, diverse set of annotated images.

What's the problem?

Medical image segmentation is crucial for diagnosing and treating diseases, but existing datasets are often too small or lack diversity, making it hard for AI models to learn effectively. This limits the ability of models to generalize and perform well across different medical imaging tasks.

What's the solution?

The authors created the IMed-361M benchmark dataset, which includes over 6.4 million medical images and 361 million masks (annotations showing where different structures are in the images). They used advanced techniques to generate high-quality interactive masks for each image, ensuring that the dataset covers a wide range of medical imaging types. Additionally, they developed a baseline model that can generate accurate masks based on user interactions like clicks and bounding boxes, allowing for more precise segmentation.

Why it matters?

This research is important because it provides a comprehensive resource for training and evaluating AI models in medical image segmentation. By offering a large-scale, well-annotated dataset and a robust model, the authors aim to enhance the capabilities of AI in healthcare, ultimately leading to better diagnostic tools and improved patient care.

Abstract

Interactive Medical Image Segmentation (IMIS) has long been constrained by the limited availability of large-scale, diverse, and densely annotated datasets, which hinders model generalization and consistent evaluation across different models. In this paper, we introduce the IMed-361M benchmark dataset, a significant advancement in general IMIS research. First, we collect and standardize over 6.4 million medical images and their corresponding ground truth masks from multiple data sources. Then, leveraging the strong object recognition capabilities of a vision foundational model, we automatically generated dense interactive masks for each image and ensured their quality through rigorous quality control and granularity management. Unlike previous datasets, which are limited by specific modalities or sparse annotations, IMed-361M spans 14 modalities and 204 segmentation targets, totaling 361 million masks-an average of 56 masks per image. Finally, we developed an IMIS baseline network on this dataset that supports high-quality mask generation through interactive inputs, including clicks, bounding boxes, text prompts, and their combinations. We evaluate its performance on medical image segmentation tasks from multiple perspectives, demonstrating superior accuracy and scalability compared to existing interactive segmentation models. To facilitate research on foundational models in medical computer vision, we release the IMed-361M and model at https://github.com/uni-medical/IMIS-Bench.

View Paper