SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Wenhao Yan, Sheng Ye, Zhuoyi Yang, Jiayan Teng, ZhenHui Dong, Kairui Wen, Xiaotao Gu, Yong-Jin Liu, Jie Tang

2025-12-08

SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Summary

This paper introduces a new system called SCAIL that aims to create much more realistic and reliable character animations, bringing them closer to the quality you'd see in professional studios.

What's the problem?

Currently, making a character convincingly mimic movements from a video or another person is really hard. Existing methods often struggle when the motions are complex, fast-paced, or when you're trying to make a character with a different build or style move like someone else. The animations can look distorted, jerky, or just plain unnatural, lacking both structural accuracy and smooth transitions over time.

What's the solution?

The researchers tackled this by focusing on two main things. First, they developed a better way to represent the 3D pose of a character, making it more adaptable to different movements. Second, they built a system that uses a combination of diffusion models and transformers to understand the entire sequence of motion, allowing it to reason about how poses should change over time. They also created a high-quality dataset to train and test their system, ensuring it's exposed to a wide range of motions.

Why it matters?

This work is important because it represents a significant step forward in automated character animation. By achieving more realistic and reliable results, SCAIL could potentially save animators a lot of time and effort, and allow for the creation of more immersive and believable characters in movies, video games, and other media.

Abstract

Achieving character animation that meets studio-grade production standards remains challenging despite recent progress. Existing approaches can transfer motion from a driving video to a reference image, but often fail to preserve structural fidelity and temporal consistency in wild scenarios involving complex motion and cross-identity animations. In this work, we present SCAIL (Studio-grade Character Animation via In-context Learning), a framework designed to address these challenges from two key innovations. First, we propose a novel 3D pose representation, providing a more robust and flexible motion signal. Second, we introduce a full-context pose injection mechanism within a diffusion-transformer architecture, enabling effective spatio-temporal reasoning over full motion sequences. To align with studio-level requirements, we develop a curated data pipeline ensuring both diversity and quality, and establish a comprehensive benchmark for systematic evaluation. Experiments show that SCAIL achieves state-of-the-art performance and advances character animation toward studio-grade reliability and realism.

View Paper