Key Features

Vectorized timestep adaptation (VTA) for fine-grained temporal control
Preserves foundation model's generative priors while injecting temporal dynamics
Unprecedented efficiency in image-to-video (I2V) generation
Zero-shot multi-task capabilities for start-end frames and video extension
Text-to-video generation capabilities
Scalable, efficient, and versatile design
Suitable for research and industry applications
Democratizes high-fidelity video generation

PUSA V1.0 sets a new standard for image-to-video (I2V) generation, achieving a VBench-I2V total score of 87.32%. Additionally, PUSA unlocks many zero-shot multi-task capabilities such as start-end frames and video extension, all without task-specific training. The model can also perform text-to-video generation, making it a versatile tool for various applications. PUSA's efficiency and capabilities make it an attractive solution for research and industry alike, democratizing high-fidelity video generation.


The PUSA V1.0 model is designed to be scalable, efficient, and versatile, making it suitable for a wide range of applications. The model's ability to preserve the foundation model's generative priors while injecting temporal dynamics allows for the generation of high-quality videos with reduced computational resources. PUSA's performance and efficiency make it an exciting development in the field of video synthesis, with potential applications in areas such as video editing, long video generation, and more.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!