Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Meng Yu, Lei Sun, Jianhao Zeng, Xiangxiang Chu, Kun Zhan

2026-04-20

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Summary

This paper investigates a flaw in diffusion models, which are powerful tools for creating things like images, and proposes a way to fix it, leading to better results.

What's the problem?

Diffusion models work by gradually adding noise to data and then learning to remove that noise to generate new data. The problem is that during the generation process, the model doesn't always accurately understand *how much* noise it should be removing at each step. This mismatch, called an SNR-timestep bias, happens because the relationship between the noise level and the step number is different during training versus when the model is actually creating something new. This leads to errors building up and lower quality outputs.

What's the solution?

The researchers noticed that diffusion models tend to create the broad, simple features of an image first, and then add the finer details. So, they broke down the image into different levels of detail – frequencies – and applied a correction to each level separately. This 'differential correction' helps the model get the noise removal right at each step for each detail level, fixing the SNR-timestep bias. It’s a relatively simple change to the existing process.

Why it matters?

This is important because it improves the quality of images and other data generated by diffusion models across many different types of models and datasets. The fix doesn't require a lot of extra computing power, making it practical to use and improving the state-of-the-art in generative modeling.

Abstract

Diffusion Probabilistic Models have demonstrated remarkable performance across a wide range of generative tasks. However, we have observed that these models often suffer from a Signal-to-Noise Ratio-timestep (SNR-t) bias. This bias refers to the misalignment between the SNR of the denoising sample and its corresponding timestep during the inference phase. Specifically, during training, the SNR of a sample is strictly coupled with its timestep. However, this correspondence is disrupted during inference, leading to error accumulation and impairing the generation quality. We provide comprehensive empirical evidence and theoretical analysis to substantiate this phenomenon and propose a simple yet effective differential correction method to mitigate the SNR-t bias. Recognizing that diffusion models typically reconstruct low-frequency components before focusing on high-frequency details during the reverse denoising process, we decompose samples into various frequency components and apply differential correction to each component individually. Extensive experiments show that our approach significantly improves the generation quality of various diffusion models (IDDPM, ADM, DDIM, A-DPM, EA-DPM, EDM, PFGM++, and FLUX) on datasets of various resolutions with negligible computational overhead. The code is at https://github.com/AMAP-ML/DCW.

View Paper