Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents
Beomsu Kim, Byunghee Cha, Jong Chul Ye
2025-10-06
Summary
This paper focuses on improving how quickly and efficiently we can generate high-quality images using a type of AI model called Consistency Models. These models are really good at creating realistic images, but they usually take a long time to train and need a lot of computing power.
What's the problem?
Consistency Models work by learning to predict images along a path of gradual change. The researchers found that during training, the model's updates – how it adjusts itself to get better – were bouncing around instead of directly moving *towards* creating realistic images. Imagine trying to roll a ball straight to a target, but instead it keeps wobbling back and forth; that's what was happening with the model's learning process, slowing down training and requiring large datasets to overcome.
What's the solution?
To fix this, the researchers created a new way to measure how well the model is learning, called 'Manifold Feature Distance'. This new measurement encourages the model's updates to point directly towards the 'manifold' – essentially, the space of realistic images. By guiding the updates in this way, their method, called 'Align Your Tangent', makes the training process much faster and more stable. It also allows the model to learn effectively even with very small datasets.
Why it matters?
This research is important because it makes it much more practical to use these powerful image-generating models. By significantly reducing training time and the need for massive datasets, it opens the door for more people to experiment with and benefit from this technology. It also improves the quality of the images generated, making them even more realistic and useful for various applications.
Abstract
With diffusion and flow matching models achieving state-of-the-art generating performance, the interest of the community now turned to reducing the inference time without sacrificing sample quality. Consistency Models (CMs), which are trained to be consistent on diffusion or probability flow ordinary differential equation (PF-ODE) trajectories, enable one or two-step flow or diffusion sampling. However, CMs typically require prolonged training with large batch sizes to obtain competitive sample quality. In this paper, we examine the training dynamics of CMs near convergence and discover that CM tangents -- CM output update directions -- are quite oscillatory, in the sense that they move parallel to the data manifold, not towards the manifold. To mitigate oscillatory tangents, we propose a new loss function, called the manifold feature distance (MFD), which provides manifold-aligned tangents that point toward the data manifold. Consequently, our method -- dubbed Align Your Tangent (AYT) -- can accelerate CM training by orders of magnitude and even out-perform the learned perceptual image patch similarity metric (LPIPS). Furthermore, we find that our loss enables training with extremely small batch sizes without compromising sample quality. Code: https://github.com/1202kbs/AYT