Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Yifu Luo, Penghui Du, Bo Li, Sinan Du, Tiantian Zhang, Yongzhe Chang, Kai Wu, Kun Gai, Xueqian Wang

2025-10-27

Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Summary

This paper introduces a new method called Chunk-GRPO for creating images from text descriptions, building on existing techniques for text-to-image generation.

What's the problem?

Current methods, like Group Relative Policy Optimization (GRPO), struggle with two main issues when generating images: they don't always accurately determine which parts of the process are contributing to good results, and they don't fully consider how the image is built up step-by-step over time. Essentially, they focus too much on individual steps instead of the bigger picture of how the image evolves.

What's the solution?

The researchers propose Chunk-GRPO, which changes the focus from optimizing each individual step to optimizing 'chunks' of consecutive steps. Think of it like grouping several brushstrokes together instead of perfecting each one in isolation. These chunks are designed to capture the natural flow of image creation. They also added an optional technique to prioritize certain chunks during the image generation process to improve results.

Why it matters?

This new approach leads to better images that more closely match the text descriptions and are generally higher quality. It suggests that looking at the image generation process in larger segments, rather than tiny steps, is a promising direction for improving text-to-image technology.

Abstract

Group Relative Policy Optimization (GRPO) has shown strong potential for flow-matching-based text-to-image (T2I) generation, but it faces two key limitations: inaccurate advantage attribution, and the neglect of temporal dynamics of generation. In this work, we argue that shifting the optimization paradigm from the step level to the chunk level can effectively alleviate these issues. Building on this idea, we propose Chunk-GRPO, the first chunk-level GRPO-based approach for T2I generation. The insight is to group consecutive steps into coherent 'chunk's that capture the intrinsic temporal dynamics of flow matching, and to optimize policies at the chunk level. In addition, we introduce an optional weighted sampling strategy to further enhance performance. Extensive experiments show that ChunkGRPO achieves superior results in both preference alignment and image quality, highlighting the promise of chunk-level optimization for GRPO-based methods.

View Paper