The key innovation lies in its implicit-keypoint-based framework, which diverges from mainstream diffusion-based methods to enhance generalization, controllability, and efficiency for practical applications.
The framework comprises two main stages: base model training and stitching and retargeting modules training. Initially, the appearance and motion extractors, warping module, and decoder are optimized from scratch. In the second stage, the stitching and retargeting modules are finely tuned while the previously trained components are frozen. This structured approach allows LivePortrait to achieve high-quality video generation with exceptional speed, as evidenced by its performance on an RTX 4090 GPU. The project also boasts an impressive dataset of around 69 million high-quality frames and employs a mixed image-video training strategy to further improve generation quality and generalization capabilities.
Key Features
- Balances computational efficiency and controllability, moving away from mainstream diffusion-based methods.
- Uses approximately 69 million high-quality frames for training.
- Incorporates both images and videos in the training process.
- Enhances the generation quality by integrating additional data.
- Controls specific facial features like eyes and lips for more precise animations.
- Supports various portrait styles including realistic, oil painting, sculpture, and 3D rendering.
- Capable of animating animal portraits by fine-tuning on animal datasets.
- Achieves a generation speed of 12.8ms on an RTX 4090 GPU.
- The inference code and models are available on GitHub.