Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

Youngrok Park, Hojung Jung, Sangmin Bae, Se-Young Yun

2025-10-15

Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

Summary

This paper focuses on improving the quality of images and other content created by diffusion models, which are a type of artificial intelligence that generates new data.

What's the problem?

Diffusion models, while good at creating things, can sometimes make mistakes during the creation process, especially when you try to specifically control what they generate. These errors cause the generated content to stray from looking realistic or matching the desired qualities, a problem the authors call being 'off-manifold'. The further along the creation process goes, the more these errors build up and the worse the quality becomes.

What's the solution?

The researchers developed a new technique called 'Temporal Alignment Guidance' or TAG. This method works by predicting how far off the creation is getting from the 'right track' at each step of the process. It then gently pulls the creation back towards that track, ensuring it stays on course and doesn't accumulate errors. Essentially, TAG checks in frequently and makes small corrections throughout the generation process.

Why it matters?

This research is important because it makes diffusion models more reliable and capable of creating high-quality content, even when users want very specific results. By fixing the 'off-manifold' problem, TAG allows these models to be used more effectively in a wider range of applications, like creating realistic images, videos, or even designing new products.

Abstract

Diffusion models have achieved remarkable success as generative models. However, even a well-trained model can accumulate errors throughout the generation process. These errors become particularly problematic when arbitrary guidance is applied to steer samples toward desired properties, which often breaks sample fidelity. In this paper, we propose a general solution to address the off-manifold phenomenon observed in diffusion models. Our approach leverages a time predictor to estimate deviations from the desired data manifold at each timestep, identifying that a larger time gap is associated with reduced generation quality. We then design a novel guidance mechanism, `Temporal Alignment Guidance' (TAG), attracting the samples back to the desired manifold at every timestep during generation. Through extensive experiments, we demonstrate that TAG consistently produces samples closely aligned with the desired manifold at each timestep, leading to significant improvements in generation quality across various downstream tasks.

View Paper