Few-step Flow for 3D Generation via Marginal-Data Transport Distillation
Zanwei Zhou, Taoran Yi, Jiemin Fang, Chen Yang, Lingxi Xie, Xinggang Wang, Wei Shen, Qi Tian
2025-09-05
Summary
This paper focuses on making 3D object generation using flow-based models much faster. These models usually need a lot of steps to create a final 3D image, which takes time. The researchers developed a new technique to significantly reduce the number of steps needed, speeding up the process without losing quality.
What's the problem?
Currently, creating 3D objects with these flow-based models is slow because it requires many individual steps to refine the image. While techniques exist to speed up 2D image generation, they haven't been effectively adapted for the more complex task of 3D generation. The core issue is how to efficiently teach a simpler 'student' model to mimic the behavior of a complex, already-trained 'teacher' model in 3D, specifically how to transfer the underlying 'flow' of data that creates the 3D shape.
What's the solution?
The researchers introduced a method called MDT-dist, which stands for Marginal-Data Transport distillation. Instead of directly trying to copy the complex flow of data, they focused on two easier-to-manage goals. First, 'Velocity Matching' ensures the student model predicts movement (velocity) in the same way as the teacher. Second, 'Velocity Distillation' uses those predicted movements to refine the overall shape and make it more accurate. By optimizing these two aspects, they effectively transfer the knowledge from the teacher to the student, allowing for fewer steps in the generation process.
Why it matters?
This work is important because it dramatically speeds up 3D object generation. They were able to reduce the number of steps from 25 to just 1 or 2, resulting in a 6.5 to 9 times faster process while maintaining the quality of the generated 3D objects. This advancement could have a big impact on fields like game development, virtual reality, and 3D design, where quick creation of 3D content is crucial.
Abstract
Flow-based 3D generation models typically require dozens of sampling steps during inference. Though few-step distillation methods, particularly Consistency Models (CMs), have achieved substantial advancements in accelerating 2D diffusion models, they remain under-explored for more complex 3D generation tasks. In this study, we propose a novel framework, MDT-dist, for few-step 3D flow distillation. Our approach is built upon a primary objective: distilling the pretrained model to learn the Marginal-Data Transport. Directly learning this objective needs to integrate the velocity fields, while this integral is intractable to be implemented. Therefore, we propose two optimizable objectives, Velocity Matching (VM) and Velocity Distillation (VD), to equivalently convert the optimization target from the transport level to the velocity and the distribution level respectively. Velocity Matching (VM) learns to stably match the velocity fields between the student and the teacher, but inevitably provides biased gradient estimates. Velocity Distillation (VD) further enhances the optimization process by leveraging the learned velocity fields to perform probability density distillation. When evaluated on the pioneer 3D generation framework TRELLIS, our method reduces sampling steps of each flow transformer from 25 to 1 or 2, achieving 0.68s (1 step x 2) and 0.94s (2 steps x 2) latency with 9.0x and 6.5x speedup on A800, while preserving high visual and geometric fidelity. Extensive experiments demonstrate that our method significantly outperforms existing CM distillation methods, and enables TRELLIS to achieve superior performance in few-step 3D generation.