TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Jintao Zhang, Kaiwen Zheng, Kai Jiang, Haoxu Wang, Ion Stoica, Joseph E. Gonzalez, Jianfei Chen, Jun Zhu

2025-12-25

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Summary

This paper introduces TurboDiffusion, a new system designed to dramatically speed up the process of creating videos using artificial intelligence, specifically diffusion models.

What's the problem?

Generating videos with AI, particularly using diffusion models, is incredibly slow and requires a lot of computing power. This makes it difficult for researchers and creators to experiment with and produce high-quality videos efficiently. Existing methods just take too long to be practical for many applications.

What's the solution?

TurboDiffusion tackles this slowness through a combination of clever techniques. First, it optimizes how the AI pays 'attention' to different parts of the video, using simplified calculations. Second, it distills the process, meaning it finds a faster way to achieve similar results with fewer steps. Finally, it reduces the precision of the numbers the AI uses, making calculations faster and the model smaller, without significantly impacting quality. These improvements are combined with other engineering tweaks to maximize speed.

Why it matters?

This work is important because it makes AI video generation much more accessible and practical. By achieving a 100-200x speedup, TurboDiffusion allows for faster experimentation, higher-resolution videos, and potentially opens up new possibilities for AI-powered video creation on more readily available hardware, like a single high-end graphics card.

Abstract

We introduce TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality. TurboDiffusion mainly relies on several components for acceleration: (1) Attention acceleration: TurboDiffusion uses low-bit SageAttention and trainable Sparse-Linear Attention (SLA) to speed up attention computation. (2) Step distillation: TurboDiffusion adopts rCM for efficient step distillation. (3) W8A8 quantization: TurboDiffusion quantizes model parameters and activations to 8 bits to accelerate linear layers and compress the model. In addition, TurboDiffusion incorporates several other engineering optimizations. We conduct experiments on the Wan2.2-I2V-14B-720P, Wan2.1-T2V-1.3B-480P, Wan2.1-T2V-14B-720P, and Wan2.1-T2V-14B-480P models. Experimental results show that TurboDiffusion achieves 100-200x speedup for video generation even on a single RTX 5090 GPU, while maintaining comparable video quality. The GitHub repository, which includes model checkpoints and easy-to-use code, is available at https://github.com/thu-ml/TurboDiffusion.

View Paper