Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals

Xiangyu Fan, Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang, Zhiqian Lin, Tianxiang Ren, Dahua Lin, Ruihao Gong, Lei Yang

2025-11-03

Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals

Summary

This paper introduces a new method called Phased Distribution Matching Distillation (Phased DMD) to improve how we create fast, simplified versions of complex AI models that generate things like images and videos. These simplified models, called 'distilled' models, are much more efficient but often lose quality compared to the original.

What's the problem?

When you try to make a fast, one-step version of a powerful generative model, it often doesn't perform well on complicated tasks, like creating realistic and detailed motion in videos. Trying to make a multi-step distilled model – one that generates things in stages – runs into problems with needing a lot of computer memory and becoming unstable. Previous attempts to fix this, like cutting off parts of the training process, actually reduced the variety of outputs the model could create, making them less interesting.

What's the solution?

Phased DMD tackles this by breaking down the distillation process into phases. It's like teaching the simplified model gradually, starting with easier aspects of the generation and then moving to more complex ones. This is done by dividing the range of possible 'signal-to-noise ratios' (a technical detail about image/video quality) into smaller sections and focusing on refining the model within each section. The researchers also used math to make sure the training process in each phase was accurate, and combined this with a technique called 'Mixture-of-Experts' to increase the model's capacity without making it too complicated.

Why it matters?

This research is important because it allows us to create faster and more efficient AI models for generating images and videos without sacrificing the quality or variety of the results. They successfully distilled very large models, like Qwen-Image and Wan2.2, demonstrating that their method works well and preserves the creative potential of these powerful AI systems. The code and models will be released, allowing others to build upon this work.

Abstract

Distribution Matching Distillation (DMD) distills score-based generative models into efficient one-step generators, without requiring a one-to-one correspondence with the sampling trajectories of their teachers. However, limited model capacity causes one-step distilled models underperform on complex generative tasks, e.g., synthesizing intricate object motions in text-to-video generation. Directly extending DMD to multi-step distillation increases memory usage and computational depth, leading to instability and reduced efficiency. While prior works propose stochastic gradient truncation as a potential solution, we observe that it substantially reduces the generation diversity of multi-step distilled models, bringing it down to the level of their one-step counterparts. To address these limitations, we propose Phased DMD, a multi-step distillation framework that bridges the idea of phase-wise distillation with Mixture-of-Experts (MoE), reducing learning difficulty while enhancing model capacity. Phased DMD is built upon two key ideas: progressive distribution matching and score matching within subintervals. First, our model divides the SNR range into subintervals, progressively refining the model to higher SNR levels, to better capture complex distributions. Next, to ensure the training objective within each subinterval is accurate, we have conducted rigorous mathematical derivations. We validate Phased DMD by distilling state-of-the-art image and video generation models, including Qwen-Image (20B parameters) and Wan2.2 (28B parameters). Experimental results demonstrate that Phased DMD preserves output diversity better than DMD while retaining key generative capabilities. We will release our code and models.

View Paper