X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
Zigang Geng, Yibing Wang, Yeyao Ma, Chen Li, Yongming Rao, Shuyang Gu, Zhao Zhong, Qinglin Lu, Han Hu, Xiaosong Zhang, Linus, Di Wang, Jie Jiang
2025-07-30
Summary
This paper talks about X-Omni, a method that uses reinforcement learning to improve how AI models generate images and language step by step, making the results look better and follow instructions more accurately.
What's the problem?
The problem is that discrete autoregressive models, which generate images or text one piece at a time, often struggle to create high-quality or coherent results and may not follow user instructions well during generation.
What's the solution?
X-Omni solves this by applying reinforcement learning techniques that reward the model for producing better, more detailed images and correctly following instructions. It uses a unified approach that works for both image and language generation, improving performance on both tasks.
Why it matters?
This matters because it helps create AI models that can produce clearer images and more accurate responses to instructions, making them more useful for creative projects, communication, and other applications involving AI-generated content.
Abstract
Reinforcement learning enhances discrete autoregressive modeling for image and language generation, achieving high-quality image generation and instruction-following capabilities using a unified framework.