At the heart of AccVideo is its trajectory-based few-step guidance strategy. By extracting and utilizing key data points from the denoising trajectories generated by a pre-trained video diffusion model, AccVideo enables a 'student' model to closely mimic the denoising process of its 'teacher' with far fewer steps. This not only accelerates video generation by over eightfold compared to leading models, but also maintains high fidelity and visual consistency in the output. The synthetic dataset captures the data distribution at each diffusion timestep, ensuring that the distilled model learns the most relevant aspects of the denoising process, which is critical for generating complex scenes and dynamic content.
To further enhance video quality, AccVideo incorporates an adversarial training strategy that aligns the output distribution of the student model with the high-quality synthetic dataset. This ensures that the accelerated model does not compromise on visual detail or resolution. AccVideo can generate 5-second videos at 720x1280 resolution and 24 frames per second, rivaling the quality of much slower diffusion models. Its open-source implementation supports multi-GPU inference and integration with popular frameworks, making it accessible for researchers, developers, and creative professionals seeking efficient, scalable video generation solutions.
Key features include:
- 8.5x faster video generation compared to conventional diffusion models
- Trajectory-based few-step guidance for efficient distillation
- Uses synthetic datasets to optimize data utilization
- Adversarial training for enhanced video quality and consistency
- Generates 5-second, 720x1280 resolution, 24fps videos
- Open-source with support for multi-GPU inference and integration