Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning

Guanjie Chen, Shirui Huang, Kai Liu, Jianchen Zhu, Xiaoye Qu, Peng Chen, Yu Cheng, Yifu Sun

2025-12-02

Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning

Summary

This paper introduces a new method called Flash-DMD to speed up and improve the quality of images created by diffusion models, which are a popular type of AI for generating images.

What's the problem?

Diffusion models are great at making realistic images, but they take a long time to do it because they build the image step-by-step. A technique called 'timestep distillation' tries to make this faster, but it often requires a lot of training and can make the images look worse. Also, trying to improve these faster models using reinforcement learning to match what people like is difficult and can lead to the AI focusing on the wrong things to get a good score.

What's the solution?

The researchers developed Flash-DMD, which tackles these problems in two main ways. First, they created a smarter way to distill the information about the image creation process, making it faster to train and producing more realistic images with much less training than previous methods. Second, they combined the distillation training with reinforcement learning at the same time. The ongoing distillation process acts like a guide, keeping the reinforcement learning stable and preventing it from going off track.

Why it matters?

Flash-DMD is important because it allows for the creation of high-quality images much faster than before, and it does so in a way that’s more reliable and easier to control. This means we can get better AI-generated images with less computing power and effort, and it opens the door to creating images that are more tailored to what people actually want.

Abstract

Diffusion Models have emerged as a leading class of generative models, yet their iterative sampling process remains computationally expensive. Timestep distillation is a promising technique to accelerate generation, but it often requires extensive training and leads to image quality degradation. Furthermore, fine-tuning these distilled models for specific objectives, such as aesthetic appeal or user preference, using Reinforcement Learning (RL) is notoriously unstable and easily falls into reward hacking. In this work, we introduce Flash-DMD, a novel framework that enables fast convergence with distillation and joint RL-based refinement. Specifically, we first propose an efficient timestep-aware distillation strategy that significantly reduces training cost with enhanced realism, outperforming DMD2 with only 2.1% its training cost. Second, we introduce a joint training scheme where the model is fine-tuned with an RL objective while the timestep distillation training continues simultaneously. We demonstrate that the stable, well-defined loss from the ongoing distillation acts as a powerful regularizer, effectively stabilizing the RL training process and preventing policy collapse. Extensive experiments on score-based and flow matching models show that our proposed Flash-DMD not only converges significantly faster but also achieves state-of-the-art generation quality in the few-step sampling regime, outperforming existing methods in visual quality, human preference, and text-image alignment metrics. Our work presents an effective paradigm for training efficient, high-fidelity, and stable generative models. Codes are coming soon.

View Paper