MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation

Yanbo Ding, Xirui Hu, Zhizhi Guo, Yali Wang

2025-05-20

MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation

Summary

This paper talks about MTVCrafter, a new system that makes it possible to animate human images in a more realistic and flexible way by understanding and using detailed motion data.

What's the problem?

The problem is that creating lifelike animations of people from images is really hard because it requires capturing not just how a person looks, but also how they move in three dimensions over time, which most current methods can't do well.

What's the solution?

To solve this, the researchers built a framework that uses special motion tokens to represent how people move in 3D, and a smart model that can handle information from different camera angles. This allows the system to create much more natural and accurate animations from regular images.

Why it matters?

This matters because it opens up new possibilities for making movies, video games, virtual reality, and other technologies that need realistic human animation, making digital characters look and move more like real people.

Abstract

MTVCrafter, a framework using 4D motion tokens and motion-aware Multi-view DiT, significantly improves human image animation by modeling raw 3D motion sequences.

View Paper