Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Yoonwoo Jeong, Cheng Sun, Frank Wang, Minsu Cho, Jaesung Choe

2025-12-30

Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Summary

This paper introduces a new way to do open-vocabulary segmentation in 3D scenes, building on recent advances using 3D Gaussian Splatting. Essentially, it's about teaching computers to 'understand' what's in a 3D scene and identify objects based on text prompts, like 'find the chair'.

What's the problem?

Current methods for this kind of 3D scene understanding struggle with efficiently processing the detailed information needed to accurately identify objects. They often simplify this information to make it faster to work with, but this simplification leads to a loss of detail and poorer results. Imagine trying to draw a detailed portrait by only using a few colors – you lose important features.

What's the solution?

The researchers developed a technique called Quantile Rendering (Q-Render). Instead of looking at *every* part of the 3D scene when trying to identify something, Q-Render focuses on the most important parts. It's like quickly scanning a room to find a specific object instead of meticulously examining every single detail. They also created a neural network, GS-Net, that helps predict these important features in a way that can be applied to different scenes.

Why it matters?

This work is important because it makes 3D scene understanding much faster and more accurate. The new method is about 43 times faster than previous approaches while still providing better results. This speedup is crucial for real-time applications like augmented reality or robotics, where computers need to quickly interpret their surroundings.

Abstract

Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, efficiently rendering the high-dimensional features required for open-vocabulary queries poses a significant challenge. Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality. To address this limitation, we introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity. Unlike conventional volume rendering, which densely samples all 3D Gaussians intersecting each ray, Q-Render sparsely samples only those with dominant influence along the ray. By integrating Q-Render into a generalizable 3D neural network, we also propose Gaussian Splatting Network (GS-Net), which predicts Gaussian features in a generalizable manner. Extensive experiments on ScanNet and LeRF demonstrate that our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate ~43.7x speedup on 512-D feature maps. Code will be made publicly available.

View Paper