Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Hyunjee Lee, Youngsik Yun, Jeongmin Bae, Seoha Kim, Youngjung Uh
2024-08-15

Summary
This paper discusses a new approach to understanding 3D scenes by improving how we segment and analyze them using advanced techniques in computer vision.
What's the problem?
Previous methods for analyzing 3D scenes, like NeRFs (Neural Radiance Fields) and 3DGS (3D Gaussian Splatting), have struggled with accurately interpreting the 3D structure of a scene. They often only provide 2D masks and rely on pixel-level supervision, which limits their ability to fully understand the 3D environment.
What's the solution?
The authors propose a new method that directly supervises the 3D points in a scene to improve the understanding of its semantics. They developed a period-aware flow matching estimator to capture the periodic features of waveforms and transferred pre-trained models to achieve real-time rendering speeds. Additionally, they introduced a new way to evaluate the geometry and semantics of the reconstructed scenes together, allowing for better analysis and manipulation of 3D data.
Why it matters?
This research is important because it enhances our ability to analyze and understand complex 3D environments, which has applications in robotics, virtual reality, and gaming. By improving how we segment and interpret these scenes, we can create more realistic simulations and better interactive experiences.
Abstract
Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations will be available online. Project page: https://hyunji12.github.io/Open3DRF