From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space
Jiazi Bu, Pengyang Ling, Yujie Zhou, Yibin Wang, Yuhang Zang, Tianyi Wei, Xiaohang Zhan, Jiaqi Wang, Tong Wu, Xingang Pan, Dahua Lin
2026-03-16
Summary
This paper introduces a new way to improve how AI image generators follow instructions, specifically focusing on making sure the generated images really match what the user wants.
What's the problem?
Current methods for teaching AI image generators to align with user preferences often fall short because they only look at a small number of perspectives when judging how well the images match the instructions. Imagine trying to decide if a group project is good based on only one person's opinion – you'd miss a lot of important details. This limited view hinders the AI's ability to truly understand and fulfill the user's vision, and limits how good the images can become.
What's the solution?
The researchers developed a technique called Multi-View GRPO. It works by creating multiple, slightly different descriptions of the original prompt. Think of it like asking several people to rephrase the instructions in their own words. The AI then uses these different descriptions to evaluate the generated images from multiple angles, getting a more complete understanding of how well they fit the user's intent. Importantly, it does this without needing to create entirely new images, making it efficient.
Why it matters?
This research is important because it pushes the boundaries of what's possible with AI image generation. By allowing the AI to better understand and respond to user preferences, it leads to higher quality images that more accurately reflect the user's creative ideas. This means more useful and satisfying results for anyone using these tools.
Abstract
Group Relative Policy Optimization (GRPO) has emerged as a powerful framework for preference alignment in text-to-image (T2I) flow models. However, we observe that the standard paradigm where evaluating a group of generated samples against a single condition suffers from insufficient exploration of inter-sample relationships, constraining both alignment efficacy and performance ceilings. To address this sparse single-view evaluation scheme, we propose Multi-View GRPO (MV-GRPO), a novel approach that enhances relationship exploration by augmenting the condition space to create a dense multi-view reward mapping. Specifically, for a group of samples generated from one prompt, MV-GRPO leverages a flexible Condition Enhancer to generate semantically adjacent yet diverse captions. These captions enable multi-view advantage re-estimation, capturing diverse semantic attributes and providing richer optimization signals. By deriving the probability distribution of the original samples conditioned on these new captions, we can incorporate them into the training process without costly sample regeneration. Extensive experiments demonstrate that MV-GRPO achieves superior alignment performance over state-of-the-art methods.