Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

Geon Choi, Hangyul Yoon, Hyunju Shin, Hyunki Park, Sang Hoon Seo, Eunho Yang, Edward Choi

2025-11-20

Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

Summary

This paper introduces a new way to automatically identify and outline areas of concern, like lesions, in chest X-ray images using simple instructions, and creates a large dataset to help train these kinds of systems.

What's the problem?

Currently, computer programs that try to find lesions in chest X-rays aren't very practical. They need a lot of specific examples to learn from, and often require doctors to write very detailed descriptions of what they're looking for, which takes time and effort. This makes it hard to use these programs in real-world situations.

What's the solution?

The researchers developed a new approach called 'instruction-guided lesion segmentation'. Basically, you can tell the program what to look for using simple language, like 'find the pneumonia'. To make this work, they created a huge dataset called MIMIC-ILS, with over a million examples, automatically generated from chest X-rays and doctor's reports. They then built a model, ROSALIA, that was trained on this dataset and can both identify lesions and explain its findings in text.

Why it matters?

This work is important because it makes it easier to use AI to help doctors find and understand problems in chest X-rays. By using simple instructions instead of complex descriptions, and by creating a large dataset for training, this research paves the way for more practical and accessible medical imaging tools.

Abstract

The applicability of current lesion segmentation models for chest X-rays (CXRs) has been limited both by a small number of target labels and the reliance on long, detailed expert-level text inputs, creating a barrier to practical use. To address these limitations, we introduce a new paradigm: instruction-guided lesion segmentation (ILS), which is designed to segment diverse lesion types based on simple, user-friendly instructions. Under this paradigm, we construct MIMIC-ILS, the first large-scale instruction-answer dataset for CXR lesion segmentation, using our fully automated multimodal pipeline that generates annotations from chest X-ray images and their corresponding reports. MIMIC-ILS contains 1.1M instruction-answer pairs derived from 192K images and 91K unique segmentation masks, covering seven major lesion types. To empirically demonstrate its utility, we introduce ROSALIA, a vision-language model fine-tuned on MIMIC-ILS. ROSALIA can segment diverse lesions and provide textual explanations in response to user instructions. The model achieves high segmentation and textual accuracy in our newly proposed task, highlighting the effectiveness of our pipeline and the value of MIMIC-ILS as a foundational resource for pixel-level CXR lesion grounding.

View Paper