Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability

Ning Li, Jingran Zhang, Justin Cui

2025-04-15

Have we unified image generation and understanding yet? An empirical
study of GPT-4o's image generation ability

Summary

This paper talks about how well GPT-4o, a powerful AI model, can generate and understand images, especially when it comes to following complicated instructions or using knowledge about the world.

What's the problem?

The problem is that while AI models like GPT-4o are getting better at both making images and understanding them, they still have trouble when asked to combine these skills. For example, they might struggle to use background knowledge, reason about the context of a picture, or follow detailed instructions when creating or editing images.

What's the solution?

The researchers tested GPT-4o by giving it a variety of tasks that required both image creation and understanding, as well as the ability to use world knowledge and context. They found that while the model can do some things well, it often falls short when the tasks are more complex or require deeper reasoning.

Why it matters?

This work matters because it shows that even the latest AI models still have a long way to go before they can truly understand and generate images like humans do. Knowing these limits helps researchers figure out what to improve next, so future AI can be more helpful and creative in real-world situations.

Abstract

Evaluation of GPT-4o shows limitations in its ability to integrate world knowledge, apply contextual reasoning, and adhere to complex instructions in image generation and editing.

View Paper