ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models

Haitang Feng, Jie Liu, Jie Tang, Gangshan Wu, Beiqi Chen, Jianhuang Lai, Guangcong Wang

2025-08-27

ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models

Summary

This paper introduces a new method, ObjFiller-3D, for completing missing parts of 3D objects and editing them to look realistic.

What's the problem?

Existing methods for filling in missing parts of 3D objects often rely on first fixing up 2D images from different viewpoints, then combining them into a 3D model. However, this process can create inconsistencies between those views, leading to blurry textures, noticeable breaks in the surface, and generally unrealistic results. It's hard to get a smooth, accurate 3D object when the individual 2D 'patches' don't quite line up.

What's the solution?

Instead of using standard 2D image inpainting, the researchers used advanced video editing technology to fill in the gaps in the 3D objects. They recognized that videos are similar to 3D scenes because they show changes over time, and adapted a video inpainting model to work with 3D data. They also added a way to use reference images to guide the reconstruction and improve the quality of the completed object.

Why it matters?

ObjFiller-3D creates much better 3D reconstructions than previous methods, with clearer details and more accurate shapes, as shown by their testing results. This is important because it opens the door to more practical applications where high-quality 3D editing is needed, like in design, virtual reality, or creating 3D models from incomplete scans.

Abstract

3D inpainting often relies on multi-view 2D image inpainting, where the inherent inconsistencies across different inpainted views can result in blurred textures, spatial discontinuities, and distracting visual artifacts. These inconsistencies pose significant challenges when striving for accurate and realistic 3D object completion, particularly in applications that demand high fidelity and structural coherence. To overcome these limitations, we propose ObjFiller-3D, a novel method designed for the completion and editing of high-quality and consistent 3D objects. Instead of employing a conventional 2D image inpainting model, our approach leverages a curated selection of state-of-the-art video editing model to fill in the masked regions of 3D objects. We analyze the representation gap between 3D and videos, and propose an adaptation of a video inpainting model for 3D scene inpainting. In addition, we introduce a reference-based 3D inpainting method to further enhance the quality of reconstruction. Experiments across diverse datasets show that compared to previous methods, ObjFiller-3D produces more faithful and fine-grained reconstructions (PSNR of 26.6 vs. NeRFiller (15.9) and LPIPS of 0.19 vs. Instant3dit (0.25)). Moreover, it demonstrates strong potential for practical deployment in real-world 3D editing applications. Project page: https://objfiller3d.github.io/ Code: https://github.com/objfiller3d/ObjFiller-3D .

View Paper