INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation

Jian Hu, Zixu Cheng, Shaogang Gong

2025-02-03

INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation

Summary

This paper talks about a new way to make AI better at identifying and outlining specific objects in images, even when those objects are hard to see or in complex medical scans. The researchers created a method called INT that helps AI understand what it's looking for more accurately, even when given very general instructions.

What's the problem?

Current AI systems that segment images (outline specific parts) sometimes struggle when they're given a general instruction that needs to apply to many different types of images. They might misunderstand what they're supposed to be looking for, especially in tricky cases like camouflaged objects or complex medical scans. This can lead to the AI making mistakes or missing important details.

What's the solution?

The researchers developed INT, which stands for Instance-specific Negative Mining for Task-Generic Promptable Segmentation. This method helps the AI figure out what it should ignore (negative information) and what it should focus on. It does this in two main steps: First, it gradually filters out incorrect information as it tries to understand the task. Second, it makes sure that what it outlines in the image actually matches what it's supposed to be looking for. The researchers tested INT on six different sets of images, including tricky ones like camouflaged objects and medical scans.

Why it matters?

This matters because it could make AI much better at understanding and analyzing images in many different fields. For example, it could help doctors spot problems in medical scans more accurately, or help scientists study camouflaged animals in nature photos. By making AI more flexible and accurate when looking at different types of images, this research could lead to improvements in areas like healthcare, scientific research, and even everyday applications like sorting through photos.

Abstract

Task-generic promptable image segmentation aims to achieve segmentation of diverse samples under a single task description by utilizing only one task-generic prompt. Current methods leverage the generalization capabilities of Vision-Language Models (VLMs) to infer instance-specific prompts from these task-generic prompts in order to guide the segmentation process. However, when VLMs struggle to generalise to some image instances, predicting instance-specific prompts becomes poor. To solve this problem, we introduce Instance-specific Negative Mining for Task-Generic Promptable Segmentation (INT). The key idea of INT is to adaptively reduce the influence of irrelevant (negative) prior knowledge whilst to increase the use the most plausible prior knowledge, selected by negative mining with higher contrast, in order to optimise instance-specific prompts generation. Specifically, INT consists of two components: (1) instance-specific prompt generation, which progressively fliters out incorrect information in prompt generation; (2) semantic mask generation, which ensures each image instance segmentation matches correctly the semantics of the instance-specific prompts. INT is validated on six datasets, including camouflaged objects and medical images, demonstrating its effectiveness, robustness and scalability.

View Paper