PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
Bowen Ping, Chengyou Jia, Minnan Luo, Changliang Xia, Xin Shen, Zhuohang Dang, Hangwei Qian
2025-12-08
Summary
This paper introduces a new method, PaCo-RL, for creating multiple images that all feel like they belong together, maintaining consistent characters, styles, and overall logic. It tackles the challenge of making AI-generated images feel cohesive when you need several of them, like for a comic book or designing a character from different angles.
What's the problem?
Currently, training AI to create consistently themed images is really hard. It needs tons of example images showing what 'consistent' looks like, and even then, it's difficult to teach the AI what *we* humans consider visually consistent because it's a subjective idea. Existing methods struggle because they lack enough good training data and can't easily capture our preferences for how things should look across multiple images.
What's the solution?
The researchers used a technique called reinforcement learning, which lets the AI learn through trial and error without needing a huge dataset of labeled examples. They built a system with two main parts: PaCo-Reward, which is like a judge that evaluates how well two images match in terms of consistency, and PaCo-GRPO, which is the learning algorithm that actually adjusts the image generation process to improve consistency. PaCo-Reward learns to score images based on how similar they are, using instructions and reasoning to make better judgements. PaCo-GRPO makes the learning process faster and more stable.
Why it matters?
This work is important because it offers a practical way to generate sets of images that are visually consistent without needing massive amounts of training data. This is a big step forward for applications like creating stories with AI-generated art, designing characters for games, or any situation where you need multiple images that all fit a specific style and theme. It shows that reinforcement learning can be a powerful tool for solving complex visual problems.
Abstract
Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. Supervised training approaches struggle with this task due to the lack of large-scale datasets capturing visual consistency and the complexity of modeling human perceptual preferences. In this paper, we argue that reinforcement learning (RL) offers a promising alternative by enabling models to learn complex and subjective visual criteria in a data-free manner. To achieve this, we introduce PaCo-RL, a comprehensive framework that combines a specialized consistency reward model with an efficient RL algorithm. The first component, PaCo-Reward, is a pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and CoT reasons. The second component, PaCo-GRPO, leverages a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization. Extensive experiments across the two representative subtasks show that PaCo-Reward significantly improves alignment with human perceptions of visual consistency, and PaCo-GRPO achieves state-of-the-art consistency performance with improved training efficiency and stability. Together, these results highlight the promise of PaCo-RL as a practical and scalable solution for consistent image generation. The project page is available at https://x-gengroup.github.io/HomePage_PaCo-RL/.