Key Features

Real-time streaming video generation
Infinite-length video generation
Block-wise Autoregressive processing
20 FPS on 5 H800 GPUs with 4-step sampling
Distribution Matching Distillation
Timestep-forcing Pipeline Parallelism
Rolling RoPE for mitigating inference drift
Adaptive Attention Sink for eliminating distribution drift

The Live Avatar framework achieves real-time streaming performance through the use of Distribution Matching Distillation and Timestep-forcing Pipeline Parallelism. These techniques enable the model to generate frames faster than playback speed and support unbounded, continuous streaming expansion based on preceding frames. This results in an 84× FPS improvement over the baseline, allowing for live video generation over 20 FPS without using quantization.


Live Avatar also addresses the issue of degradation over long, autoregressive generation, which can manifest as identity drift and color shifts. The framework uses strategies such as Rolling RoPE, Adaptive Attention Sink, and History Corrupt to mitigate these issues and enable infinite-length streaming for over 10,000 seconds without quality degradation or identity drift. This makes it suitable for applications such as interactive dialogue agents and virtual avatars.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!