Chain-of-Zoom

Paid Superresolution Image Enhancement

LikeWebsite Promote

Key Features

Model-agnostic framework for extreme super-resolution

Scale autoregression and preference alignment

Multi-scale-aware prompts for improved image quality

Repeated use of backbone SR model for efficient processing

Decomposition of conditional probability into tractable sub-problems

Alignment of text guidance towards human preference

Use of vision-language model (VLM) for prompt generation

Fine-tuning of VLM using Generalized Reward Policy Optimization (GRPO)

CoZ addresses the scalability bottleneck of modern SISR models, which deliver photo-realistic results at the scale factors on which they are trained but collapse when asked to magnify far beyond that regime. By using a vision-language model (VLM) to generate multi-scale-aware text prompts, CoZ can overcome the sparsity of the original input signal and produce more realistic images. The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference.

Experiments show that CoZ can achieve high-quality super-resolution results at extreme scales, outperforming conventional SR methods and other variants of CoZ with different text prompts. The use of GRPO fine-tuning of the VLM enhances human preference alignment, as validated by mean-opinion-score (MOS) tests for human-preferred image generation and human-preferred text generation. CoZ has the potential to be applied to various applications, such as image and video enhancement, and can be used to improve the quality of images and videos in various fields.

Get more likes & reach the top of search results by adding this button on your site!

Chain-of-Zoom

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter