Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling

Shuhong Zheng, Ashkan Mirzaei, Igor Gilitschenski

2025-10-28

Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling

Summary

This paper introduces a new technique, called TIRE, for creating 3D or 3D videos (4D) of a specific person or object, starting from a general 3D model.

What's the problem?

Current methods for generating 3D models and videos are really good at making things look realistic and efficient, but they often struggle to accurately represent a *specific* person or object from all angles. If you try to personalize a 3D generation – meaning, make it look like *you* – existing methods often lose your unique features when viewed from different perspectives. It's hard to get a 3D model that truly looks like the subject from every viewpoint.

What's the solution?

TIRE works in three main steps. First, it uses video tracking to pinpoint the areas of the initial 3D model that need to change to match the subject. Then, it uses a special type of image editing, called inpainting, to fill in those areas with details that reflect the subject’s appearance. Finally, it takes these updated 2D views and combines them back into a consistent 3D model, ensuring everything still looks correct in three dimensions. Essentially, it starts with a base model and then carefully modifies it to look like the target subject.

Why it matters?

This research is important because it significantly improves how well 3D and 4D generations can represent a specific individual or object. This has implications for things like creating realistic avatars, personalized virtual experiences, and more accurate 3D reconstructions from images or videos. It moves us closer to being able to easily generate 3D content that truly reflects the identity of a particular subject.

Abstract

Current 3D/4D generation methods are usually optimized for photorealism, efficiency, and aesthetics. However, they often fail to preserve the semantic identity of the subject across different viewpoints. Adapting generation methods with one or few images of a specific subject (also known as Personalization or Subject-driven generation) allows generating visual content that align with the identity of the subject. However, personalized 3D/4D generation is still largely underexplored. In this work, we introduce TIRE (Track, Inpaint, REsplat), a novel method for subject-driven 3D/4D generation. It takes an initial 3D asset produced by an existing 3D generative model as input and uses video tracking to identify the regions that need to be modified. Then, we adopt a subject-driven 2D inpainting model for progressively infilling the identified regions. Finally, we resplat the modified 2D multi-view observations back to 3D while still maintaining consistency. Extensive experiments demonstrate that our approach significantly improves identity preservation in 3D/4D generation compared to state-of-the-art methods. Our project website is available at https://zsh2000.github.io/track-inpaint-resplat.github.io/.

View Paper