ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Junying Chen, Zhenyang Cai, Pengcheng Chen, Shunian Chen, Ke Ji, Xidong Wang, Yunjin Yang, Benyou Wang
2025-06-26
Summary
This paper talks about ShareGPT-4o-Image and Janus-4o, which are open research projects that focus on photo-realistic and instruction-based image generation using advanced multimodal models similar to GPT-4o.
What's the problem?
The problem is that creating AI models that generate realistic images in response to specific instructions is hard, and most high-quality image generation models are proprietary and not open for research and improvement by the community.
What's the solution?
The researchers built ShareGPT-4o-Image and Janus-4o, which use large datasets and advanced multimodal technology to allow open research on generating images that align closely with user instructions, making it easier for other researchers to study and improve image generation.
Why it matters?
This matters because making photorealistic image generation technology openly available helps accelerate progress in AI, allowing more people to experiment and create better models for applications like art, design, education, and accessibility.
Abstract
ShareGPT-4o-Image and Janus-4o enable open research in photorealistic, instruction-aligned image generation through a large dataset and multimodal model.