Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

Dongyang Liu, Peng Gao, David Liu, Ruoyi Du, Zhen Li, Qilong Wu, Xin Jin, Sihan Cao, Shifeng Zhang, Hongsheng Li, Steven Hoi

2025-12-01

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

Summary

This paper investigates how well diffusion models can be sped up through a technique called distillation, where a smaller, faster 'student' model learns from a larger, slower 'teacher' model. It challenges the common belief that the student learns simply by mimicking the teacher's outputs.

What's the problem?

Researchers thought that the success of distillation in diffusion models came from the student model matching the distribution of outputs from the teacher model. However, this paper questions that idea, especially when generating complex things like images from text. The problem is understanding *why* distillation works so well, and what part of the process is actually doing the heavy lifting.

What's the solution?

The authors broke down the distillation process into its individual parts and discovered that a component related to 'Classifier-Free Guidance' (CFG) – a technique used to improve image quality – is actually the main driver of the student model's performance. They found that matching the teacher’s output distribution is more like a stabilizing force, preventing the student from going off track, rather than the primary learning mechanism. They also showed that other methods could provide this stabilization just as effectively. Finally, they used this understanding to improve the distillation process by treating these two parts separately, leading to better results.

Why it matters?

This research changes how we think about distillation for diffusion models. By identifying CFG augmentation as the key to success, it opens the door to designing even more efficient and effective distillation methods. This is important because faster image generation models are crucial for many applications, and their work was even used to build a state-of-the-art image generator, proving its practical value.

Abstract

Diffusion model distillation has emerged as a powerful technique for creating efficient few-step and single-step generators. Among these, Distribution Matching Distillation (DMD) and its variants stand out for their impressive performance, which is widely attributed to their core mechanism of matching the student's output distribution to that of a pre-trained teacher model. In this work, we challenge this conventional understanding. Through a rigorous decomposition of the DMD training objective, we reveal that in complex tasks like text-to-image generation, where CFG is typically required for desirable few-step performance, the primary driver of few-step distillation is not distribution matching, but a previously overlooked component we identify as CFG Augmentation (CA). We demonstrate that this term acts as the core ``engine'' of distillation, while the Distribution Matching (DM) term functions as a ``regularizer'' that ensures training stability and mitigates artifacts. We further validate this decoupling by demonstrating that while the DM term is a highly effective regularizer, it is not unique; simpler non-parametric constraints or GAN-based objectives can serve the same stabilizing function, albeit with different trade-offs. This decoupling of labor motivates a more principled analysis of the properties of both terms, leading to a more systematic and in-depth understanding. This new understanding further enables us to propose principled modifications to the distillation process, such as decoupling the noise schedules for the engine and the regularizer, leading to further performance gains. Notably, our method has been adopted by the Z-Image ( https://github.com/Tongyi-MAI/Z-Image ) project to develop a top-tier 8-step image generation model, empirically validating the generalization and robustness of our findings.

View Paper