Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan

2024-12-02

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Summary

This paper introduces TeaCache, a new method designed to improve the speed of video generation using diffusion models by optimizing how model outputs are cached and reused.

What's the problem?

Diffusion models are effective for generating high-quality videos, but they can be slow because they process each frame one at a time. Previous methods tried to speed things up by caching outputs at regular time intervals, but this approach often ignored differences in the model’s outputs at different times, which can lead to poor video quality and efficiency.

What's the solution?

TeaCache solves this problem by focusing on the inputs to the model instead of the outputs. It uses something called timestep embeddings to better understand and estimate how different the model's outputs will be at various points in time. This allows TeaCache to cache the most relevant information without needing to spend a lot of time processing every output. The method has been tested and shows that it can make video generation up to 4.41 times faster while maintaining almost the same visual quality.

Why it matters?

This research is important because it provides a more efficient way to generate videos, making it easier for creators to produce high-quality content quickly. By improving the speed and quality of video generation, TeaCache can benefit industries like film, gaming, and virtual reality, where fast and visually appealing videos are essential.

Abstract

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods speed up the models by caching and reusing model outputs at uniformly selected timesteps. However, such a strategy neglects the fact that differences among model outputs are not uniform across timesteps, which hinders selecting the appropriate model outputs to cache, leading to a poor balance between inference efficiency and visual quality. In this study, we introduce Timestep Embedding Aware Cache (TeaCache), a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps. Rather than directly using the time-consuming model outputs, TeaCache focuses on model inputs, which have a strong correlation with the modeloutputs while incurring negligible computational cost. TeaCache first modulates the noisy inputs using the timestep embeddings to ensure their differences better approximating those of model outputs. TeaCache then introduces a rescaling strategy to refine the estimated differences and utilizes them to indicate output caching. Experiments show that TeaCache achieves up to 4.41x acceleration over Open-Sora-Plan with negligible (-0.07% Vbench score) degradation of visual quality.

View Paper