GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer
Sayan Deb Sarkar, Sinisa Stekovic, Vincent Lepetit, Iro Armeni
2025-10-21
Summary
This paper focuses on the challenge of transferring how something *looks* – its appearance – onto 3D models. Think about wanting to make a video game character look like a specific photograph or match a description you type in. This is useful for creating content in gaming, augmented reality, and other digital art fields.
What's the problem?
Current methods for transferring appearance onto 3D objects struggle when the shape of the object you're applying the appearance *to* is very different from the original object the appearance came from. Simply using existing 3D model generators doesn't work well either, leading to results that don't look good. The core issue is adapting an appearance from one shape to a drastically different shape.
What's the solution?
The researchers developed a new technique that subtly guides a 3D model generation process. Instead of directly creating a new model, they start with a pre-trained model and gently nudge it in the right direction during creation. This 'nudging' is done using mathematical formulas that focus on matching specific parts of the appearance and ensuring the new model looks consistent with itself. Importantly, this method doesn't require any additional training – it works right away.
Why it matters?
This work is important because it provides a more reliable way to transfer appearances onto 3D models, even when the shapes are very different. They also point out that standard ways of measuring how good the results are aren't very effective for this task, and instead used an AI system (GPT) to evaluate the quality in a way that aligns better with human perception. This means better-looking 3D content can be created more easily, and the method is flexible enough to work with different types of 3D model generators and appearance guides.
Abstract
Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.