Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model

Yaxuan Huang, Xili Dai, Jianan Wang, Xianbiao Qi, Yixing Yuan, Xiangyu Yue

2025-03-04

Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain
Model

Summary

This paper talks about a new way to figure out the layout of a room using multiple photos taken from different angles. The researchers created a system called Plane-DUSt3R that can understand room structures more easily and accurately than previous methods.

What's the problem?

Figuring out room layouts from multiple photos is really hard because it usually needs many complicated steps, like understanding camera angles and matching up different parts of images. This makes it slow and prone to errors.

What's the solution?

The researchers made Plane-DUSt3R, which uses a special AI model called DUSt3R to look at all the photos at once and figure out the room layout in one step. They trained it on a dataset of room layouts and tweaked it to focus on finding the main planes (like walls and floors) in a room. This new method can handle multiple photos from different angles and even works on cartoon-style images.

Why it matters?

This matters because it could make it much easier and faster to create 3D models of rooms from just a few photos. This could be really useful for things like virtual reality, home design apps, or even helping robots understand their surroundings better. It's also cool because it works on different types of images, which means it could be used in lots of different situations.

Abstract

Room layout estimation from multiple-perspective images is poorly investigated due to the complexities that emerge from multi-view geometry, which requires muti-step solutions such as camera intrinsic and extrinsic estimation, image matching, and triangulation. However, in 3D reconstruction, the advancement of recent 3D foundation models such as DUSt3R has shifted the paradigm from the traditional multi-step structure-from-motion process to an <PRE_TAG>end-to-end</POST_TAG> single-step approach. To this end, we introduce Plane-DUSt3R, a novel method for multi-view room layout estimation leveraging the 3D foundation model DUSt3R. Plane-DUSt3R incorporates the DUSt3R framework and fine-tunes on a room layout dataset (Structure3D) with a modified objective to estimate structural planes. By generating uniform and parsimonious results, Plane-DUSt3R enables room layout estimation with only a single post-processing step and 2D detection results. Unlike previous methods that rely on single-perspective or panorama image, Plane-DUSt3R extends the setting to handle multiple-perspective images. Moreover, it offers a streamlined, <PRE_TAG>end-to-end</POST_TAG> solution that simplifies the process and reduces error accumulation. Experimental results demonstrate that Plane-DUSt3R not only outperforms state-of-the-art methods on the synthetic dataset but also proves robust and effective on in the wild data with different image styles such as cartoon.Our code is available at: https://github.com/justacar/Plane-DUSt3R

View Paper