Key Features

Joint 3D scene and human reconstruction
Feed-forward framework
Monocular video input
Estimation of scene geometry, camera parameters, and SMPL parameters
Refinement of human surface details
Optimization of geometric correspondence
Coherent scene and human alignment
State-of-the-art performance on human-centric scene reconstruction

The network architecture of UniSH consists of a Reconstruction Branch and a Human Body Branch. The Reconstruction Branch predicts per-frame camera extrinsics, confidence maps, and pointmaps, while the Human Body Branch estimates global SMPL shape parameters and per-frame pose parameters. Features from both branches are processed by AlignNet to predict the global scene scale and per-frame SMPL translations for coherent scene and human alignment.


UniSH achieves state-of-the-art performance on human-centric scene reconstruction and delivers highly competitive results on global human motion estimation. It jointly recovers high-fidelity scene geometry, human point clouds, camera parameters, and coherent, metric-scale SMPL bodies in a single forward pass. The framework is capable of handling challenging dynamic scenes with strong spatial-temporal coherence, making it a powerful tool for various applications.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!