The core innovation of Animate-X lies in its enhanced motion representation capabilities. The framework introduces a novel component called the Pose Indicator, which captures comprehensive motion patterns from driving videos through both implicit and explicit means. The implicit approach leverages CLIP visual features to extract the essence of motion, including overall movement patterns and temporal relationships between motions. The explicit method strengthens the generalization of the Latent Diffusion Model (LDM) by simulating potential inputs that may arise during inference.
Animate-X's architecture is built upon the LDM, allowing it to handle various character types, collectively referred to as "X". This versatility enables the framework to animate not only human figures but also anthropomorphic characters, significantly expanding its potential applications in creative industries.
To evaluate the performance of Animate-X, the researchers introduced a new Animated Anthropomorphic Benchmark (A^2Bench). This benchmark consists of 500 anthropomorphic characters along with corresponding dance videos, providing a comprehensive dataset for assessing the framework's capabilities in animating diverse character types.
Key features of Animate-X include:
- Universal Character Animation: Capable of animating both human and anthropomorphic characters from a single reference image.
- Enhanced Motion Representation: Utilizes a Pose Indicator with both implicit and explicit features to capture comprehensive motion patterns.
- Strong Generalization: Demonstrates robust performance across various character types, even when trained solely on human datasets.
- Identity Preservation: Excels in maintaining the appearance and identity of the reference character throughout the animation.
- Motion Consistency: Produces animations with high temporal continuity and precise, vivid movements.
- Pose Robustness: Handles challenging poses, including turning movements and transitions from sitting to standing.
- Long Video Generation: Capable of producing extended animation sequences while maintaining consistency.
- Compatibility with Various Character Sources: Successfully animates characters from popular games, cartoons, and even real-world figures.
- Exaggerated Motion Support: Able to generate expressive and exaggerated figure motions while preserving the character's original appearance.
- CLIP Integration: Leverages CLIP visual features for improved motion understanding and representation.