Diffusion Adversarial Post-Training for One-Step Video Generation

Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang

2025-01-15

Diffusion Adversarial Post-Training for One-Step Video Generation

Summary

This paper talks about a new way to make AI generate videos much faster using a method called Adversarial Post-Training (APT). The researchers created a model that can make high-quality videos in just one step, which is much quicker than current methods.

What's the problem?

Current AI models that create videos, called diffusion models, are really good at making high-quality videos, but they're super slow. They have to go through many steps to make a video, which takes a lot of time and computer power. Some people have tried to make this faster for creating images, but when they do, the quality of the images gets much worse.

What's the solution?

The researchers came up with a new trick called Adversarial Post-Training (APT). They take an AI that's already trained to make videos and teach it to do the job in just one step. They made some clever changes to how the AI is built and trained, including something called 'R1 regularization', to make sure the videos still look good even when made quickly. Their new model, called Seaweed-APT, can make a 2-second, high-definition video in real-time with just one step. It can also make large, detailed images in one step that look as good as those made by the best current methods.

Why it matters?

This matters because it could make creating AI-generated videos much faster and cheaper. Imagine being able to create a movie-quality video clip instantly just by describing what you want. This could be huge for things like video games, virtual reality, or even making movies. It could also help researchers and artists experiment with new ideas much more quickly. Plus, since it uses less computer power, it's better for the environment and could make this technology available to more people who don't have super powerful computers.

Abstract

The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model, Seaweed-APT, can generate 2-second, 1280x720, 24fps videos in real time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.

View Paper