Wan2.2 is trained on a significantly larger dataset than its predecessor, Wan2.1, with +65.6% more images and +83.2% more videos. This expansion enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving top performance among all open-sourced and closed-sourced models. The model also supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can run on consumer-grade graphics cards like 4090.
Wan2.2 open-sources a 5B model built with the advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously. The model is also compatible with various frameworks and tools, including PyTorch, Hugging Face, and ModelScope.