The Live Avatar framework achieves real-time streaming performance through the use of Distribution Matching Distillation and Timestep-forcing Pipeline Parallelism. These techniques enable the model to generate frames faster than playback speed and support unbounded, continuous streaming expansion based on preceding frames. This results in an 84× FPS improvement over the baseline, allowing for live video generation over 20 FPS without using quantization.
Live Avatar also addresses the issue of degradation over long, autoregressive generation, which can manifest as identity drift and color shifts. The framework uses strategies such as Rolling RoPE, Adaptive Attention Sink, and History Corrupt to mitigate these issues and enable infinite-length streaming for over 10,000 seconds without quality degradation or identity drift. This makes it suitable for applications such as interactive dialogue agents and virtual avatars.

