HumanMM: Global Human Motion Recovery from Multi-shot Videos

Yuhong Zhang, Guanlin Wu, Ling-Hao Chen, Zhuokai Zhao, Jing Lin, Xiaoke Jiang, Jiamin Wu, Zhuoheng Li, Hao Frank Yang, Haoqian Wang, Lei Zhang

2025-03-11

HumanMM: Global Human Motion Recovery from Multi-shot Videos

Summary

This paper talks about HumanMM, a method that reconstructs 3D human movements from videos shot with multiple cameras or angles, making the motion smooth and realistic even when the video cuts between different shots.

What's the problem?

Existing methods struggle to create accurate 3D motion from videos with sudden camera cuts or messy backgrounds, often causing glitches like 'foot sliding' (unnatural foot movement) or broken motion between shots.

What's the solution?

HumanMM detects camera cuts automatically, aligns the 3D poses across different shots using smart AI tools, and fixes foot sliding with a custom motion smoother to keep movements natural and consistent.

Why it matters?

This helps create better animations for movies, games, or virtual reality by turning real-world videos into accurate 3D motion without glitches, even when using footage from multiple cameras or angles.

Abstract

In this paper, we present a novel framework designed to reconstruct long-sequence 3D human motion in the world coordinates from in-the-wild videos with multiple shot transitions. Such long-sequence in-the-wild motions are highly valuable to applications such as motion generation and motion understanding, but are of great challenge to be recovered due to abrupt shot transitions, partial occlusions, and dynamic backgrounds presented in such videos. Existing methods primarily focus on single-shot videos, where continuity is maintained within a single camera view, or simplify multi-shot alignment in camera space only. In this work, we tackle the challenges by integrating an enhanced camera pose estimation with Human Motion Recovery (HMR) by incorporating a shot transition detector and a robust alignment module for accurate pose and orientation continuity across shots. By leveraging a custom motion integrator, we effectively mitigate the problem of foot sliding and ensure temporal consistency in human pose. Extensive evaluations on our created multi-shot dataset from public 3D human datasets demonstrate the robustness of our method in reconstructing realistic human motion in world coordinates.

View Paper