Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang

2025-01-24

Improving Video Generation with Human Feedback

Summary

This paper talks about a new way to make AI-generated videos better by using human feedback. The researchers created a system called VideoReward that learns from people's opinions to improve how computers create videos.

What's the problem?

AI can now make videos, but they often have problems like jerky movements or not matching what people ask for in the text descriptions. It's like asking an artist to paint a scene, but the painting comes out blurry or with the wrong colors.

What's the solution?

The researchers did three main things: First, they made a big collection of videos with ratings from real people. Then, they created VideoReward, which is like a smart judge that can predict how people would rate videos. Finally, they made new ways to train video-making AI using this judge, including methods called Flow-DPO, Flow-RWR, and Flow-NRG. These help the AI learn what people like and don't like in videos.

Why it matters?

This matters because it could make AI-generated videos much better and more useful. Imagine being able to describe any video you want and having an AI create it exactly how you pictured it, with smooth motion and matching your description perfectly. This could be huge for things like making movies, creating educational content, or even helping people visualize their ideas. It's a big step towards making AI creativity more aligned with what humans actually want and enjoy.

Abstract

Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models by extending those from diffusion models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and standard supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs. Project page: https://gongyeliu.github.io/videoalign.

View Paper