SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

Zewen Chen, Juan Wang, Wen Wang, Sunhan Xu, Hang Xiong, Yun Zeng, Jian Guo, Shuxun Wang, Chunfeng Yuan, Bing Li, Weiming Hu

2024-11-20

SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

Summary

This paper introduces SEAGULL, a new method for assessing the quality of specific areas in images, called Regions of Interest (ROIs), using advanced techniques that combine vision and language.

What's the problem?

Most existing methods for evaluating image quality focus on the overall image rather than specific areas. This is a problem because analyzing the quality of certain regions can provide more detailed guidance for improving images, especially in applications where certain parts of an image are more important than others.

What's the solution?

The authors developed SEAGULL, which uses a vision-language model to assess the quality of ROIs. It incorporates masks from a model called Segment Anything Model (SAM) to identify these regions and employs a special feature extractor to gather detailed information about both global and local aspects of the ROIs. Additionally, they created two datasets, SEAGULL-100w and SEAGULL-3k, to train and evaluate the model's performance on assessing image quality in these specific areas. After training, SEAGULL showed excellent results in evaluating the quality of different ROIs.

Why it matters?

This research is significant because it enhances how we evaluate images by focusing on important areas rather than just looking at the whole picture. By improving ROI quality assessment, SEAGULL can help in various fields like photography, medical imaging, and any area where specific details in images matter, leading to better image quality and more effective improvements.

Abstract

Existing Image Quality Assessment (IQA) methods achieve remarkable success in analyzing quality for overall image, but few works explore quality analysis for Regions of Interest (ROIs). The quality analysis of ROIs can provide fine-grained guidance for image quality improvement and is crucial for scenarios focusing on region-level quality. This paper proposes a novel network, SEAGULL, which can SEe and Assess ROIs quality with GUidance from a Large vision-Language model. SEAGULL incorporates a vision-language model (VLM), masks generated by Segment Anything Model (SAM) to specify ROIs, and a meticulously designed Mask-based Feature Extractor (MFE) to extract global and local tokens for specified ROIs, enabling accurate fine-grained IQA for ROIs. Moreover, this paper constructs two ROI-based IQA datasets, SEAGULL-100w and SEAGULL-3k, for training and evaluating ROI-based IQA. SEAGULL-100w comprises about 100w synthetic distortion images with 33 million ROIs for pre-training to improve the model's ability of regional quality perception, and SEAGULL-3k contains about 3k authentic distortion ROIs to enhance the model's ability to perceive real world distortions. After pre-training on SEAGULL-100w and fine-tuning on SEAGULL-3k, SEAGULL shows remarkable performance on fine-grained ROI quality assessment. Code and datasets are publicly available at the https://github.com/chencn2020/Seagull.

View Paper