Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos
Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu
2024-09-16

Summary
This paper introduces Robust Dual Gaussian Splatting, a new method for creating immersive volumetric videos that allow users to experience 3D human performances in real-time.
What's the problem?
Volumetric video technology has great potential for creating realistic virtual experiences, but it often requires a lot of manual work to stabilize video sequences and can create very large files that are difficult to manage. This makes it hard to use in everyday applications.
What's the solution?
The authors propose a new approach called DualGS, which uses Gaussian splatting to separate the motion of characters from their appearance. This separation helps reduce unnecessary data and improves the quality of the video. They also developed a training process that gradually refines the video quality frame by frame. To make the videos easier to store and use, they compress the motion and appearance data efficiently, achieving a compression ratio of up to 120 times, so that each frame only takes about 350KB of storage.
Why it matters?
This research is significant because it enhances how we can create and view interactive 3D videos, making it possible for users to watch performances in a more engaging way. With this technology, users can feel like they are part of the action, which is valuable for entertainment, education, and virtual reality experiences.
Abstract
Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed DualGS, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.