The method uses a pretrained 2D-to-3D lifter as a noisy teacher, diffuses the lifted estimates, and supervises denoising through 2D reprojection losses. The page describes depth-weighted reprojection, velocity consistency, and representation alignment as adaptations of standard 3D motion regularizers to a 2D-only setting.
VideoMDM is useful for motion-generation researchers because monocular video is much easier to collect than clean 3D motion capture. The project exposes a paper, code repository, supplementary material, and a direct hero video showing generated motion examples.


