Faster Video Diffusion with Trainable Sparse Attention

Peiyuan Zhang, Haofeng Huang, Yongqi Chen, Will Lin, Zhengzhong Liu, Ion Stoica, Eric P. Xing, Hao Zhang

2025-05-20

Faster Video Diffusion with Trainable Sparse Attention

Summary

This paper talks about a new method called trainable sparse attention that helps AI models create or process videos much faster without losing much quality.

What's the problem?

The problem is that making or editing videos with AI usually needs a lot of computer power and time, especially as the models get bigger and more complex, which makes it hard to use them for large projects or on regular computers.

What's the solution?

To solve this, the researchers designed a way for the AI to focus only on the most important parts of the video data, instead of looking at everything at once. This approach, called trainable sparse attention, lets the model work faster and use less energy, but still keeps the results looking good.

Why it matters?

This matters because it means AI can be used to make and edit videos more quickly and efficiently, making these tools more practical and accessible for everyone, from filmmakers to everyday users.

Abstract

Trainable sparse attention (VSA) reduces computational cost in video diffusion transformers with minimal impact on performance, enabling efficient scaling of the models.

View Paper