FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
Shuyuan Tu, Yueming Pan, Yinming Huang, Xintong Han, Zhen Xing, Qi Dai, Kai Qiu, Chong Luo, Zuxuan Wu
2025-12-19
Summary
This paper introduces FlashPortrait, a new method for creating realistic and fast animations of faces, specifically focusing on 'long-portrait' videos like those you might see on social media.
What's the problem?
Existing methods for speeding up the creation of these facial animations often struggle to maintain a consistent identity throughout the video; the person's face can subtly change and look 'off' over time, especially in longer animations. It takes a long time to generate high-quality, long-form facial animations.
What's the solution?
FlashPortrait tackles this by first extracting general facial expression information. Then, it uses a special technique to normalize these features, essentially making sure the core identity isn't lost during the animation process. During the actual animation creation, it cleverly predicts future frames by analyzing how the face is changing, allowing it to skip many of the usual processing steps and speed things up significantly – up to six times faster! It also blends frames smoothly to avoid jarring transitions.
Why it matters?
This work is important because it allows for much faster creation of high-quality facial animations without sacrificing the realism and consistency of the person's identity. This has implications for things like creating personalized videos, improving video conferencing, and making virtual avatars more believable.
Abstract
Current diffusion-based acceleration methods for long-portrait animation struggle to ensure identity (ID) consistency. This paper presents FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6x acceleration in inference speed. In particular, FlashPortrait begins by computing the identity-agnostic facial expression features with an off-the-shelf extractor. It then introduces a Normalized Facial Expression Block to align facial features with diffusion latents by normalizing them with their respective means and variances, thereby improving identity stability in facial modeling. During inference, FlashPortrait adopts a dynamic sliding-window scheme with weighted blending in overlapping areas, ensuring smooth transitions and ID consistency in long animations. In each context window, based on the latent variation rate at particular timesteps and the derivative magnitude ratio among diffusion layers, FlashPortrait utilizes higher-order latent derivatives at the current timestep to directly predict latents at future timesteps, thereby skipping several denoising steps and achieving 6x speed acceleration. Experiments on benchmarks show the effectiveness of FlashPortrait both qualitatively and quantitatively.