RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Junyan Ye, Leiqi Zhu, Yuncheng Guo, Dongzhi Jiang, Zilong Huang, Yifan Zhang, Zhiyuan Yan, Haohuan Fu, Conghui He, Weijia Li

2025-12-08

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Summary

This paper introduces RealGen, a new system designed to create images from text that look incredibly realistic, aiming to overcome the 'fake' appearance often seen in current AI image generators.

What's the problem?

While recent AI models like GPT-Image-1 and Qwen-Image are good at understanding what you want when you type a description, the images they produce often don't look truly real, exhibiting telltale signs of being AI-generated like overly smooth skin or an unnatural sheen on faces. Essentially, they struggle to achieve photorealism.

What's the solution?

The researchers developed RealGen, which works in two main parts. First, it refines your text description to be more effective. Then, it uses a diffusion model to generate the image. Crucially, RealGen includes a 'Detector Reward' system – it uses AI detectors to identify artificial-looking features in the image and then adjusts the image generation process to minimize those flaws, making it more realistic. They also created a new way to automatically evaluate how realistic images are, called RealBench, which uses these detectors and also simulates how people would rate the images.

Why it matters?

This work is important because it pushes the field of AI image generation closer to its original goal: creating images that are indistinguishable from real photographs. By improving realism, detail, and overall quality, RealGen and the RealBench evaluation tool represent a significant step forward in making AI-generated images more useful and believable.

Abstract

With the continuous advancement of image generation technology, advanced models such as GPT-Image-1 and Qwen-Image have achieved remarkable text-to-image consistency and world knowledge However, these models still fall short in photorealistic image generation. Even on simple T2I tasks, they tend to produce " fake" images with distinct AI artifacts, often characterized by "overly smooth skin" and "oily facial sheens". To recapture the original goal of "indistinguishable-from-reality" generation, we propose RealGen, a photorealistic text-to-image framework. RealGen integrates an LLM component for prompt optimization and a diffusion model for realistic image generation. Inspired by adversarial generation, RealGen introduces a "Detector Reward" mechanism, which quantifies artifacts and assesses realism using both semantic-level and feature-level synthetic image detectors. We leverage this reward signal with the GRPO algorithm to optimize the entire generation pipeline, significantly enhancing image realism and detail. Furthermore, we propose RealBench, an automated evaluation benchmark employing Detector-Scoring and Arena-Scoring. It enables human-free photorealism assessment, yielding results that are more accurate and aligned with real user experience. Experiments demonstrate that RealGen significantly outperforms general models like GPT-Image-1 and Qwen-Image, as well as specialized photorealistic models like FLUX-Krea, in terms of realism, detail, and aesthetics. The code is available at https://github.com/yejy53/RealGen.

View Paper