Key Features

Trains 3D human motion generation from monocular 2D pose supervision.
Uses a pretrained 2D-to-3D lifter as a noisy teacher during training.
Applies diffusion denoising in 3D while supervising with 2D reprojection.
Uses depth-weighted reprojection losses with theoretical connection to 3D supervision.
Adapts velocity and representation-alignment regularizers for 2D-supervised training.
Reports strong results on HumanML3D, Fit3D, and NBA-style real video data.
Provides paper, code, supplementary material, and project examples.
Includes a direct hero demo video hosted on the project page.

The method uses a pretrained 2D-to-3D lifter as a noisy teacher, diffuses the lifted estimates, and supervises denoising through 2D reprojection losses. The page describes depth-weighted reprojection, velocity consistency, and representation alignment as adaptations of standard 3D motion regularizers to a 2D-only setting.


VideoMDM is useful for motion-generation researchers because monocular video is much easier to collect than clean 3D motion capture. The project exposes a paper, code repository, supplementary material, and a direct hero video showing generated motion examples.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!