ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Guanghao Li, Kerui Ren, Linning Xu, Zhewen Zheng, Changjian Jiang, Xin Gao, Bo Dai, Jian Pu, Mulin Yu, Jiangmiao Pang

2025-10-10

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Summary

This paper introduces ARTDECO, a new system for quickly creating 3D models from a series of regular 2D images, like those from a video. It aims to bridge the gap between fast, but less accurate, 3D creation and slow, but very accurate, methods.

What's the problem?

Currently, building 3D models from images faces a tough choice. You can either spend a lot of computing time to get a really detailed and accurate model, or you can create a model very quickly, but it won't be as precise or reliable. Existing methods either take too long to be useful in real-time applications like augmented reality, or they don't produce high-quality results that are needed for things like realistic simulations.

What's the solution?

ARTDECO solves this by combining the best parts of both approaches. It uses advanced 3D models to initially guess the scene's structure and camera positions, then refines this guess using a technique similar to how self-driving cars build maps (SLAM). It represents the 3D scene using 'Gaussians,' which are mathematical shapes that can efficiently capture complex geometry, and organizes these Gaussians in a way that allows for detailed rendering without unnecessary calculations. This hierarchical approach allows for both speed and accuracy.

Why it matters?

This work is important because it makes it practical to quickly and accurately digitize real-world environments. This has huge implications for applications like creating realistic virtual worlds from real places, improving augmented and virtual reality experiences, and enabling robots to better understand and interact with their surroundings. It offers a way to get high-quality 3D models in real-time, which was previously a major challenge.

Abstract

On-the-fly 3D reconstruction from monocular image sequences is a long-standing challenge in computer vision, critical for applications such as real-to-sim, AR/VR, and robotics. Existing methods face a major tradeoff: per-scene optimization yields high fidelity but is computationally expensive, whereas feed-forward foundation models enable real-time inference but struggle with accuracy and robustness. In this work, we propose ARTDECO, a unified framework that combines the efficiency of feed-forward models with the reliability of SLAM-based pipelines. ARTDECO uses 3D foundation models for pose estimation and point prediction, coupled with a Gaussian decoder that transforms multi-scale features into structured 3D Gaussians. To sustain both fidelity and efficiency at scale, we design a hierarchical Gaussian representation with a LoD-aware rendering strategy, which improves rendering fidelity while reducing redundancy. Experiments on eight diverse indoor and outdoor benchmarks show that ARTDECO delivers interactive performance comparable to SLAM, robustness similar to feed-forward systems, and reconstruction quality close to per-scene optimization, providing a practical path toward on-the-fly digitization of real-world environments with both accurate geometry and high visual fidelity. Explore more demos on our project page: https://city-super.github.io/artdeco/.

View Paper