SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park, Seungtae Nam, Hyeonsoo Im, Sangheon Shin, Sangpil Kim, Eunbyung Park

2024-11-29

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Summary

This paper introduces SelfSplat, a new method for creating 3D models from images taken from different angles without needing specific poses or prior 3D information.

What's the problem?

Generating accurate 3D models from images can be very challenging, especially when the images don't provide clear information about the camera positions or the shapes of objects. Most existing methods require precise data, which is often not available, making it hard to create high-quality 3D reconstructions.

What's the solution?

SelfSplat solves this problem by using a combination of techniques that allow it to estimate the depth and position of objects directly from the images. It integrates self-supervised learning to improve understanding of the images and uses a special network to ensure that the geometry of the 3D model is consistent across different views. This means it can create better and more stable 3D representations without needing additional training or specific poses.

Why it matters?

This research is important because it advances how we can create 3D models from regular photos, making it easier to generate accurate representations for various applications like virtual reality, gaming, and architecture. By improving the ability to reconstruct 3D objects from simple images, SelfSplat opens up new possibilities for technology and creativity.

Abstract

We propose SelfSplat, a novel 3D Gaussian Splatting model designed to perform pose-free and 3D prior-free generalizable 3D reconstruction from unposed multi-view images. These settings are inherently ill-posed due to the lack of ground-truth data, learned geometric information, and the need to achieve accurate 3D reconstruction without finetuning, making it difficult for conventional methods to achieve high-quality results. Our model addresses these challenges by effectively integrating explicit 3D representations with self-supervised depth and pose estimation techniques, resulting in reciprocal improvements in both pose accuracy and 3D reconstruction quality. Furthermore, we incorporate a matching-aware pose estimation network and a depth refinement module to enhance geometry consistency across views, ensuring more accurate and stable 3D reconstructions. To present the performance of our method, we evaluated it on large-scale real-world datasets, including RealEstate10K, ACID, and DL3DV. SelfSplat achieves superior results over previous state-of-the-art methods in both appearance and geometry quality, also demonstrates strong cross-dataset generalization capabilities. Extensive ablation studies and analysis also validate the effectiveness of our proposed methods. Code and pretrained models are available at https://gynjn.github.io/selfsplat/

View Paper