The method adapts a single-view 3D reconstruction model to generate temporally consistent per-frame 3D predictions through causal latent conditioning. These predictions initialize a deformable 3D Gaussian Splatting representation, which is then refined with occlusion-aware appearance optimization and a view-conditioned diffusion prior.
Lift4D is useful for 4D reconstruction research, dynamic object capture, and monocular video-to-asset workflows. It improves over prior baselines on challenging sequences with occlusion and non-rigid motion by combining observed details with learned completion priors.


