Minute-Long Videos with Dual Parallelisms
Zeqing Wang, Bowen Zheng, Xingyi Yang, Yuecong Xu, Xinchao Wang
2025-05-28
Summary
This paper talks about a new technique called DualParal that helps computers create long, high-quality videos much faster and with less memory by spreading out the work across multiple computer chips.
What's the problem?
The problem is that making minute-long videos using advanced AI models, like diffusion transformers, takes a lot of time and uses up a huge amount of memory, which makes it really hard to generate longer videos smoothly.
What's the solution?
To fix this, the researchers came up with DualParal, a method that splits up the video frames and the layers of the AI model so they can be processed at the same time on different GPUs. They also use a special way of cleaning up the video in blocks and store important features to avoid repeating work, which makes the whole process much faster and more efficient.
Why it matters?
This is important because it lets creators and developers make longer, better-looking videos with AI without needing super expensive computers, opening up new possibilities for movies, games, and online content.
Abstract
A distributed inference strategy, DualParal, is proposed to address high processing latency and memory costs in diffusion transformer-based video diffusion models by parallelizing frames and layers across GPUs with a block-wise denoising scheme and feature cache.