The network architecture of UniSH consists of a Reconstruction Branch and a Human Body Branch. The Reconstruction Branch predicts per-frame camera extrinsics, confidence maps, and pointmaps, while the Human Body Branch estimates global SMPL shape parameters and per-frame pose parameters. Features from both branches are processed by AlignNet to predict the global scene scale and per-frame SMPL translations for coherent scene and human alignment.
UniSH achieves state-of-the-art performance on human-centric scene reconstruction and delivers highly competitive results on global human motion estimation. It jointly recovers high-fidelity scene geometry, human point clouds, camera parameters, and coherent, metric-scale SMPL bodies in a single forward pass. The framework is capable of handling challenging dynamic scenes with strong spatial-temporal coherence, making it a powerful tool for various applications.


