OmniGen2: Exploration to Advanced Multimodal Generation

Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, Ze Liu, Ziyi Xia, Chaofan Li, Haoge Deng, Jiahao Wang, Kun Luo, Bo Zhang, Defu Lian, Xinlong Wang, Zhongyuan Wang, Tiejun Huang, Zheng Liu

2025-06-24

OmniGen2: Exploration to Advanced Multimodal Generation

Summary

This paper talks about OmniGen2, a powerful and flexible AI model that can generate and understand both text and images by using two separate pathways for each kind of data.

What's the problem?

The problem is that many models either mix text and image processing in ways that reduce quality or struggle to keep the original strength in text generation when adding image capabilities.

What's the solution?

The researchers designed OmniGen2 with a Y-shaped architecture that keeps text and image generation separate but coordinated, preserving strong text skills while improving image quality. They also introduced a reflection mechanism that helps the model analyze and improve its image outputs over time.

Why it matters?

This matters because it allows for more advanced and consistent generation of both images and text, enabling better applications in creative fields like art, design, and content creation where combining text and images is important.

Abstract

OmniGen2, a versatile generative model, introduces dual decoding pathways for text and images, preserves original text generation, and achieves competitive results with a new subject-driven benchmark.

View Paper