SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

Jiaming Zhang, Shengming Cao, Rui Li, Xiaotong Zhao, Yutao Cui, Xinglin Hou, Gangshan Wu, Haolan Chen, Yu Xu, Limin Wang, Kai Ma

2025-11-26

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

Summary

This paper introduces a new method, called SteadyDancer, for making realistic animations of people from a single image and desired movements.

What's the problem?

Currently, animating a person from an image and specific motions is tricky because existing methods often change the person's appearance over time, causing them to look like a different person, or create unnatural movements. This happens because it's hard to perfectly match the new movements to the original image without causing distortions or losing the original identity. The common approach struggles with keeping the person looking consistent throughout the animation and accurately following the intended motion.

What's the solution?

SteadyDancer tackles this by focusing on a different approach to animation. It uses a 'Condition-Reconciliation Mechanism' to balance the need for accurate movement with maintaining the original image's appearance. It also creates a special 'pose representation' that adapts well to the original image, ensuring the movements look natural. Finally, it trains the system in stages, first focusing on accurate movement, then visual quality, and finally smooth transitions between frames, to optimize the overall result.

Why it matters?

This research is important because it significantly improves the quality of animated human figures. SteadyDancer creates more realistic and consistent animations while also requiring less computing power and data for training compared to other state-of-the-art methods, making it more accessible and efficient for creating high-quality animations.

Abstract

Preserving first-frame identity while ensuring precise motion control is a fundamental challenge in human image animation. The Image-to-Motion Binding process of the dominant Reference-to-Video (R2V) paradigm overlooks critical spatio-temporal misalignments common in real-world applications, leading to failures such as identity drift and visual artifacts. We introduce SteadyDancer, an Image-to-Video (I2V) paradigm-based framework that achieves harmonized and coherent animation and is the first to ensure first-frame preservation robustly. Firstly, we propose a Condition-Reconciliation Mechanism to harmonize the two conflicting conditions, enabling precise control without sacrificing fidelity. Secondly, we design Synergistic Pose Modulation Modules to generate an adaptive and coherent pose representation that is highly compatible with the reference image. Finally, we employ a Staged Decoupled-Objective Training Pipeline that hierarchically optimizes the model for motion fidelity, visual quality, and temporal coherence. Experiments demonstrate that SteadyDancer achieves state-of-the-art performance in both appearance fidelity and motion control, while requiring significantly fewer training resources than comparable methods.

View Paper