PE3R: Perception-Efficient 3D Reconstruction
Jie Hu, Shizun Wang, Xinchao Wang
2025-03-11
Summary
This paper talks about PE3R, a faster and smarter way for computers to turn 2D photos into 3D models by understanding scenes in one go, like instantly building a Lego model from a picture.
What's the problem?
Current methods for making 3D models from photos are slow, struggle with new scenes they haven’t seen before, and sometimes get details wrong.
What's the solution?
PE3R uses a streamlined setup that builds 3D models in a single step instead of multiple passes, allowing it to handle new scenes and objects it wasn’t specifically trained on while keeping details accurate.
Why it matters?
This helps robots, VR apps, and self-driving cars quickly understand their surroundings in 3D, making them safer and more efficient in real-world tasks.
Abstract
Recent advancements in 2D-to-3D perception have significantly improved the understanding of 3D scenes from 2D images. However, existing methods face critical challenges, including limited generalization across scenes, suboptimal perception accuracy, and slow reconstruction speeds. To address these limitations, we propose Perception-Efficient 3D Reconstruction (PE3R), a novel framework designed to enhance both accuracy and efficiency. PE3R employs a feed-forward architecture to enable rapid 3D semantic field reconstruction. The framework demonstrates robust zero-shot generalization across diverse scenes and objects while significantly improving reconstruction speed. Extensive experiments on 2D-to-3D open-vocabulary segmentation and 3D reconstruction validate the effectiveness and versatility of PE3R. The framework achieves a minimum 9-fold speedup in 3D semantic field reconstruction, along with substantial gains in perception accuracy and reconstruction precision, setting new benchmarks in the field. The code is publicly available at: https://github.com/hujiecpp/PE3R.