PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation

Yaofang Liu, Yumeng Ren, Aitor Artola, Yuxuan Hu, Xiaodong Cun, Xiaotong Zhao, Alan Zhao, Raymond H. Chan, Suiyun Zhang, Rui Liu, Dandan Tu, Jean-Michel Morel

2025-07-24

PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized
Timestep Adaptation

Summary

This paper talks about PUSA, a new way to make video diffusion models work faster and better by adapting how they process time using a vectorized technique called timestep adaptation.

What's the problem?

Training video diffusion models usually costs a lot of money and requires large datasets because handling the timing information in videos is complex and slow.

What's the solution?

The researchers developed PUSA, which changes how the models think about time steps during video generation in a way that reduces training cost to just $500, improves performance, and allows the model to handle different tasks without extra training.

Why it matters?

This matters because it makes creating and using video AI models much more affordable and efficient, opening the door to more practical applications like video editing, generation, and multi-task learning.

Abstract

Pusa, a vectorized timestep adaptation framework, enhances video diffusion models by enabling efficient and versatile temporal control, significantly reducing training costs and dataset sizes while improving performance and enabling zero-shot multi-task capabilities.

View Paper