Wan-Animate: Unified Character Animation and Replacement with Holistic Replication
Gang Cheng, Xin Gao, Li Hu, Siqi Hu, Mingyang Huang, Chaonan Ji, Ju Li, Dechao Meng, Jinwei Qi, Penchong Qiao, Zhen Shen, Yafei Song, Ke Sun, Linrui Tian, Feng Wang, Guangyuan Wang, Qi Wang, Zhongjian Wang, Jiayu Xiao, Sheng Xu, Bang Zhang, Peng Zhang
2025-09-18
Summary
This paper introduces Wan-Animate, a new system for easily animating characters in videos or swapping characters into existing scenes, making it look realistic.
What's the problem?
Creating realistic character animation and seamlessly integrating animated characters into videos is really hard. Existing methods often struggle with either accurately copying movements and expressions, or making the character look like it actually belongs in the scene's lighting and color.
What's the solution?
Wan-Animate uses a system built on a pre-existing model called Wan. It takes a picture of a character and a video as input. To animate, it copies the movements and facial expressions from the video onto the character. To replace a character, it puts the new character into the video and adjusts the lighting and colors so it blends in naturally. They added a special tool, called Relighting LoRA, to specifically handle getting the lighting right during character replacement, and they use skeleton data for body movements and facial features for expressions.
Why it matters?
This work is important because it provides a single, effective way to both animate characters and realistically place them into videos. This could be useful for creating special effects in movies, personalized videos, or even just for fun, and the researchers are making the technology available for others to use.
Abstract
We introduce Wan-Animate, a unified framework for character animation and replacement. Given a character image and a reference video, Wan-Animate can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos. Alternatively, it can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone to achieve seamless environmental integration. Wan-Animate is built upon the Wan model. To adapt it for character animation tasks, we employ a modified input paradigm to differentiate between reference conditions and regions for generation. This design unifies multiple tasks into a common symbolic representation. We use spatially-aligned skeleton signals to replicate body motion and implicit facial features extracted from source images to reenact expressions, enabling the generation of character videos with high controllability and expressiveness. Furthermore, to enhance environmental integration during character replacement, we develop an auxiliary Relighting LoRA. This module preserves the character's appearance consistency while applying the appropriate environmental lighting and color tone. Experimental results demonstrate that Wan-Animate achieves state-of-the-art performance. We are committed to open-sourcing the model weights and its source code.