Scale-wise Distillation of Diffusion Models
Nikita Starodubcev, Denis Kuznedelev, Artem Babenko, Dmitry Baranchuk
2025-03-21
Summary
This paper talks about SwD, a method that speeds up AI image generation by creating images in stages—starting with a rough sketch at low resolution and adding details step-by-step, like zooming in on a photo without losing quality.
What's the problem?
Current AI image generators are slow because they work on full-resolution images from start to finish, wasting time on details that aren’t needed early in the process.
What's the solution?
SwD makes AI generate images faster by starting with smaller, simpler versions and gradually increasing resolution at each step, combined with a new training trick that keeps details accurate.
Why it matters?
This lets apps generate high-quality images quickly on phones or laptops, useful for art tools, game design, or real-time editing without needing expensive hardware.
Abstract
We present SwD, a scale-wise distillation framework for diffusion models (DMs), which effectively employs next-scale prediction ideas for diffusion-based few-step generators. In more detail, SwD is inspired by the recent insights relating diffusion processes to the implicit spectral autoregression. We suppose that DMs can initiate generation at lower data resolutions and gradually upscale the samples at each denoising step without loss in performance while significantly reducing computational costs. SwD naturally integrates this idea into existing diffusion distillation methods based on distribution matching. Also, we enrich the family of distribution matching approaches by introducing a novel patch loss enforcing finer-grained similarity to the target distribution. When applied to state-of-the-art text-to-image diffusion models, SwD approaches the inference times of two full resolution steps and significantly outperforms the counterparts under the same computation budget, as evidenced by automated metrics and human preference studies.