FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Jiale Xu, Shenghua Gao, Ying Shan

2024-12-13

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Summary

This paper discusses FreeSplatter, a new method for creating detailed 3D models from a few images without needing to know the exact position of the camera used to take those images.

What's the problem?

Most existing methods for 3D reconstruction require precise information about the camera's position and settings when the photos were taken. This is difficult to obtain, especially when only a few images are available, and it can limit the ability to create accurate 3D models from sparse-view images.

What's the solution?

FreeSplatter solves this problem by using a feed-forward framework that can generate high-quality 3D representations directly from uncalibrated images. It employs a transformer architecture that allows it to analyze multiple views of an object and create Gaussian representations (which are like tiny, fuzzy points) of the scene. This method can quickly estimate camera parameters and produce detailed 3D models without needing precise camera information.

Why it matters?

This research is important because it makes 3D modeling more accessible and efficient, allowing for high-quality reconstructions from just a few images. This can be useful in various fields, such as virtual reality, gaming, and robotics, where creating realistic environments or objects quickly is essential.

Abstract

Existing sparse-view reconstruction models heavily rely on accurate known camera poses. However, deriving camera extrinsics and intrinsics from sparse-view images presents significant challenges. In this work, we present FreeSplatter, a highly scalable, feed-forward reconstruction framework capable of generating high-quality 3D Gaussians from uncalibrated sparse-view images and recovering their camera parameters in mere seconds. FreeSplatter is built upon a streamlined transformer architecture, comprising sequential self-attention blocks that facilitate information exchange among multi-view image tokens and decode them into pixel-wise 3D Gaussian primitives. The predicted Gaussian primitives are situated in a unified reference frame, allowing for high-fidelity 3D modeling and instant camera parameter estimation using off-the-shelf solvers. To cater to both object-centric and scene-level reconstruction, we train two model variants of FreeSplatter on extensive datasets. In both scenarios, FreeSplatter outperforms state-of-the-art baselines in terms of reconstruction quality and pose estimation accuracy. Furthermore, we showcase FreeSplatter's potential in enhancing the productivity of downstream applications, such as text/image-to-3D content creation.

View Paper