AnimateAnything: Consistent and Controllable Animation for Video Generation
Guojun Lei, Chi Wang, Hong Li, Rong Zhang, Yikai Wang, Weiwei Xu
2024-11-19

Summary
This paper presents AnimateAnything, a new method for generating videos that allows for precise and controllable animations based on various input conditions, such as camera movements and user actions.
What's the problem?
Creating consistent and high-quality animations for videos can be challenging, especially when trying to control different aspects like how the camera moves or what actions are happening. Many existing methods struggle with maintaining smooth transitions and avoiding flickering, especially during large movements.
What's the solution?
AnimateAnything addresses these issues by using a special network that combines different control features to create a unified motion representation. It converts all control information into something called optical flows, which help guide the video generation process. To further improve video quality and reduce flickering, the method includes a frequency-based stabilization module that ensures smoothness over time. This two-step approach allows for better handling of various conditions while producing high-quality animations.
Why it matters?
This research is significant because it enhances the ability to create detailed and stable animations in videos, which can be useful for filmmakers, game developers, and content creators. By improving how animations are generated and controlled, AnimateAnything opens up new possibilities for creative expression in digital media.
Abstract
We present a unified controllable video generation approach AnimateAnything that facilitates precise and consistent video manipulation across various conditions, including camera trajectories, text prompts, and user motion annotations. Specifically, we carefully design a multi-scale control feature fusion network to construct a common motion representation for different conditions. It explicitly converts all control information into frame-by-frame optical flows. Then we incorporate the optical flows as motion priors to guide final video generation. In addition, to reduce the flickering issues caused by large-scale motion, we propose a frequency-based stabilization module. It can enhance temporal coherence by ensuring the video's frequency domain consistency. Experiments demonstrate that our method outperforms the state-of-the-art approaches. For more details and videos, please refer to the webpage: https://yu-shaonian.github.io/Animate_Anything/.