PICABench: How Far Are We from Physically Realistic Image Editing?

Yuandong Pu, Le Zhuo, Songhao Han, Jinbo Xing, Kaiwen Zhu, Shuo Cao, Bin Fu, Si Liu, Hongsheng Li, Yu Qiao, Wenlong Zhang, Xi Chen, Yihao Liu

2025-10-21

PICABench: How Far Are We from Physically Realistic Image Editing?

Summary

This paper investigates how realistic image editing is, going beyond just making the edits people ask for and focusing on whether the changes make sense in the real world.

What's the problem?

Current image editing models are good at following instructions, like removing an object from a picture, but they often fail to account for the physical consequences of that edit. For example, if you remove a chair, the shadow and any dent it might be making in the carpet should also disappear, but current models don't consistently do that. There wasn't a good way to measure how well these models handle these 'physical effects'.

What's the solution?

The researchers created a new benchmark called PICABench to specifically test how well image editing models handle physical realism. This benchmark looks at things like how light interacts with objects, how objects move, and how things change state. They also developed a way to evaluate the edits using a large language model that 'judges' the realism, backed up by human feedback. To help models improve, they also created a dataset of 100,000 images, PICA-100K, designed to teach models about physics.

Why it matters?

This work is important because it highlights a major weakness in current image editing technology. While edits can *look* good, they often aren't *realistic*. By providing a way to measure and improve physical realism, this research pushes the field towards creating image editing tools that generate truly believable and consistent images.

Abstract

Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.

View Paper