Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, Paul Debevec, Ning Yu

2025-01-22

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Summary

This paper talks about a new way to make AI-generated videos look more realistic and controllable. The researchers created a method called 'Go-with-the-Flow' that helps AI systems create videos where the movement of objects and the camera can be easily controlled.

What's the problem?

Current AI systems that generate videos often struggle to create realistic and controllable motion. It's hard for these systems to make objects move naturally or to control how the camera moves in the video. This makes it difficult for people to use AI to create specific video effects or movements they want.

What's the solution?

The researchers came up with a clever trick. Instead of changing how the AI works, they changed the data the AI learns from. They developed a special algorithm that takes normal videos and turns them into 'structured noise.' This noise contains information about how things move in the video. When the AI learns from this structured noise, it becomes better at creating videos with controlled motion. Their method works quickly and can be used with different types of AI video generation systems without needing to change how those systems are built.

Why it matters?

This matters because it could make AI-generated videos much more useful and realistic. Filmmakers, game designers, or anyone creating digital content could use this to easily control how things move in AI-generated videos. For example, they could make a character walk in a specific way or create a smooth camera pan across a scene. It's also important because it works with existing AI systems, making it easier for people to adopt and use. This could lead to more creative and high-quality AI-generated videos in movies, video games, and other digital media.

Abstract

Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow. Source code and model checkpoints are available on GitHub: https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow.

View Paper