HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Haonan Qiu, Shikun Liu, Zijian Zhou, Zhaochong An, Weiming Ren, Zhiheng Liu, Jonas Schult, Sen He, Shoufa Chen, Yuren Cong, Tao Xiang, Ziwei Liu, Juan-Manuel Perez-Rua

2025-12-25

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Summary

This paper introduces a new method called HiStream for creating high-resolution videos more quickly and efficiently, addressing a major slowdown in current video generation technology.

What's the problem?

Generating high-quality videos is really demanding for computers because the standard techniques, called diffusion models, become incredibly slow as the video resolution increases. Essentially, the amount of processing power needed grows way too fast, making it impractical to create long or detailed videos. It takes a very long time to generate even a short, high-definition clip.

What's the solution?

HiStream tackles this problem by cleverly reducing unnecessary calculations in three ways. First, it starts by creating a low-resolution version of the video and then adds detail, reusing information from the initial low-resolution pass. Second, it breaks the video into smaller chunks and keeps a 'memory' of previous chunks to speed up processing. Finally, it reduces the number of refinement steps needed for later chunks, building on the information already established. This combination of techniques significantly speeds up the process without sacrificing too much visual quality.

Why it matters?

This research is important because it makes high-resolution video generation feasible for more people and applications. By making the process much faster—up to 107 times faster in some cases—HiStream opens the door to creating detailed videos for things like movies, digital art, and other media without needing extremely powerful (and expensive) computers. It makes scalable, high-quality video creation a reality.

Abstract

High-resolution video generation, while crucial for digital media and film, is computationally bottlenecked by the quadratic complexity of diffusion models, making practical inference infeasible. To address this, we introduce HiStream, an efficient autoregressive framework that systematically reduces redundancy across three axes: i) Spatial Compression: denoising at low resolution before refining at high resolution with cached features; ii) Temporal Compression: a chunk-by-chunk strategy with a fixed-size anchor cache, ensuring stable inference speed; and iii) Timestep Compression: applying fewer denoising steps to subsequent, cache-conditioned chunks. On 1080p benchmarks, our primary HiStream model (i+ii) achieves state-of-the-art visual quality while demonstrating up to 76.2x faster denoising compared to the Wan2.1 baseline and negligible quality loss. Our faster variant, HiStream+, applies all three optimizations (i+ii+iii), achieving a 107.5x acceleration over the baseline, offering a compelling trade-off between speed and quality, thereby making high-resolution video generation both practical and scalable.

View Paper