To tame video diffusion models for generating high-fidelity panoramic videos, HoloTime introduces the 360World dataset, the first comprehensive collection of panoramic videos suitable for downstream 4D scene reconstruction tasks. With this curated dataset, HoloTime proposes Panoramic Animator, a two-stage image-to-video diffusion model that can convert panoramic images into high-quality panoramic videos. Following this, HoloTime presents Panoramic Space-Time Reconstruction, which leverages a space-time depth estimation method to transform the generated panoramic videos into 4D point clouds.
HoloTime's method has been validated through a comparative analysis with existing approaches, revealing its superiority in both panoramic video generation and 4D scene reconstruction. This demonstrates HoloTime's capability to create more engaging and realistic immersive environments, thereby enhancing user experiences in VR and AR applications. HoloTime's framework can be used to generate high-quality panoramic videos and 4D scenes, enabling a fully immersive experience for users.