Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
Jixuan Fan, Wanhua Li, Yifei Han, Yansong Tang
2024-12-09
Summary
This paper talks about Momentum-GS, a new method for improving the quality of 3D scene reconstruction by using a technique called momentum Gaussian self-distillation, which helps manage memory usage while maintaining high accuracy.
What's the problem?
Reconstructing large 3D scenes can be very memory-intensive and requires a lot of storage space. Traditional methods struggle with this because they often need to process all parts of a scene at once, which can lead to inaccuracies when training models. Additionally, when training is done in parallel across multiple GPUs, it can limit the number of parts that can be processed based on the available hardware, which reduces the overall quality of the reconstruction.
What's the solution?
The authors propose Momentum-GS, which uses a new approach to improve training efficiency and accuracy. It employs a momentum-based self-distillation technique that allows different parts of the scene to learn from each other without being limited by the number of GPUs. This method includes a 'teacher' model that provides guidance to ensure consistency across the various sections being reconstructed. They also introduce a block weighting system that adjusts how much focus is placed on each section based on its reconstruction quality, ensuring that weaker areas receive more attention during training.
Why it matters?
This research is important because it enhances the ability to create accurate digital representations of large environments, which is useful in fields like virtual reality, gaming, and urban planning. By reducing memory requirements and improving reconstruction quality, Momentum-GS makes it easier and more efficient to work with complex 3D scenes, paving the way for advancements in technology that rely on detailed visual representations.
Abstract
3D Gaussian Splatting has demonstrated notable success in large-scale scene reconstruction, but challenges persist due to high training memory consumption and storage overhead. Hybrid representations that integrate implicit and explicit features offer a way to mitigate these limitations. However, when applied in parallelized block-wise training, two critical issues arise since reconstruction accuracy deteriorates due to reduced data diversity when training each block independently, and parallel training restricts the number of divided blocks to the available number of GPUs. To address these issues, we propose Momentum-GS, a novel approach that leverages momentum-based self-distillation to promote consistency and accuracy across the blocks while decoupling the number of blocks from the physical GPU count. Our method maintains a teacher Gaussian decoder updated with momentum, ensuring a stable reference during training. This teacher provides each block with global guidance in a self-distillation manner, promoting spatial consistency in reconstruction. To further ensure consistency across the blocks, we incorporate block weighting, dynamically adjusting each block's weight according to its reconstruction accuracy. Extensive experiments on large-scale scenes show that our method consistently outperforms existing techniques, achieving a 12.8% improvement in LPIPS over CityGaussian with much fewer divided blocks and establishing a new state of the art. Project page: https://jixuan-fan.github.io/Momentum-GS_Page/