SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

Yicheng Xiao, Wenhu Zhang, Lin Song, Yukang Chen, Wenbo Li, Nan Jiang, Tianhe Ren, Haokun Lin, Wei Huang, Haoyang Huang, Xiu Li, Nan Duan, Xiaojuan Qi

2026-04-07

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

Summary

This paper focuses on improving how well computers can edit images by precisely moving objects around and changing the camera's perspective, essentially allowing for detailed control over the scene's layout.

What's the problem?

Currently, image editing models aren't very good at making *small*, precise changes to where things are in an image or how the camera is positioned. There's a lack of good ways to actually *test* how well these models perform these kinds of spatial edits, and it's hard to get enough real-world examples to train them effectively.

What's the solution?

The researchers created a new testing benchmark called SpatialEdit-Bench to evaluate these spatial editing abilities, looking at both how realistic the edits look and how accurately the camera viewpoint is changed. To overcome the lack of training data, they built a large, synthetic dataset called SpatialEdit-500k using a 3D program called Blender, which allows them to create images with perfect information about object positions and camera angles. Finally, they developed a new model, SpatialEdit-16B, trained on this dataset, which performs better than existing methods at these fine-grained spatial manipulations.

Why it matters?

This work is important because it provides a standardized way to measure and improve spatial editing in images. Better spatial editing has applications in many areas, like creating realistic visual effects, designing virtual environments, and even improving how robots understand and interact with the world around them. By releasing their data and model, they're helping other researchers build on this work and advance the field.

Abstract

Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete benchmark that evaluates spatial editing by jointly measuring perceptual plausibility and geometric fidelity via viewpoint reconstruction and framing analysis. (ii) To address the data bottleneck for scalable training, we construct SpatialEdit-500k, a synthetic dataset generated with a controllable Blender pipeline that renders objects across diverse backgrounds and systematic camera trajectories, providing precise ground-truth transformations for both object- and camera-centric operations. (iii) Building on this data, we develop SpatialEdit-16B, a baseline model for fine-grained spatial editing. Our method achieves competitive performance on general editing while substantially outperforming prior methods on spatial manipulation tasks. All resources will be made public at https://github.com/EasonXiao-888/SpatialEdit.

View Paper