BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Jiacheng Chen, Ramin Mehran, Xuhui Jia, Saining Xie, Sanghyun Woo

2025-06-30

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Summary

This paper talks about BlenderFusion, a new system that helps create and edit 3D scenes by breaking images into objects, allowing detailed 3D manipulation of those objects and the camera, and then blending everything back into a realistic final image.

What's the problem?

The problem is that current AI systems that generate or edit images using text prompts struggle to control the exact 3D position, shape, and appearance of objects in a scene, making complex edits and combining different visual elements hard to do precisely.

What's the solution?

The researchers developed BlenderFusion, which segments an image into editable 3D objects, gives users precise control over these objects and the camera using professional 3D tools in Blender, and then uses an advanced AI model called a diffusion model to combine the edited objects and background seamlessly into a photorealistic final image.

Why it matters?

This matters because it combines the power of 3D graphic tools and modern AI image generation, enabling more accurate, flexible, and high-quality editing of scenes for use in movies, games, virtual reality, and design.

Abstract

A generative visual compositing framework using a diffusion model for scene editing and composition with source masking and simulated object jittering.

View Paper