The Rebind module in CoDance provides dual guidance, leveraging semantic features from text prompts and spatial features from subject masks to direct the learned motion to intended characters. This ensures precise control and subject association, enabling the animation of complex scenes with multiple subjects and diverse character types. The model is trained on a combination of animation data and a diverse text-to-video dataset, alternating between the two to bolster its semantic comprehension.
CoDance achieves state-of-the-art performance on multi-subject animation tasks, exhibiting remarkable generalization across diverse subjects and spatial layouts. The model can animate characters in games, cartoons, and other domains, handling single and multiple subjects, as well as long videos with music. The code and weights for CoDance will be open-sourced, making it accessible to researchers and developers for further improvement and application.


