MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting
Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao Lu, Zhuoyuan Li, Tianzhu Zhang
2025-08-26
Summary
This paper introduces a new method called MeshSplat for creating 3D models from a small number of images, using a technique called Gaussian Splatting.
What's the problem?
Existing methods for building 3D models from images struggle when you don't have many images of the object or scene. If the views are too sparse, the resulting 3D model isn't accurate or complete, making it hard to reconstruct the scene's true shape.
What's the solution?
MeshSplat tackles this by using a system that first predicts how the scene would look from new viewpoints, without needing actual 3D data for training. It does this by predicting 2D Gaussian Splats – essentially, how color and depth change across the image. To make these predictions more accurate, the researchers added a special loss function that focuses on getting the depth right where images overlap, and a network to ensure the 3D splats are oriented correctly based on surface normals estimated from single images.
Why it matters?
This work is important because it allows for high-quality 3D reconstruction even with very limited input images. This is useful in situations where getting lots of images is difficult or impossible, like reconstructing objects from drones or creating 3D models from old photographs, and it sets a new standard for performance in sparse-view 3D reconstruction.
Abstract
Surface reconstruction has been widely studied in computer vision and graphics. However, existing surface reconstruction works struggle to recover accurate scene geometry when the input views are extremely sparse. To address this issue, we propose MeshSplat, a generalizable sparse-view surface reconstruction framework via Gaussian Splatting. Our key idea is to leverage 2DGS as a bridge, which connects novel view synthesis to learned geometric priors and then transfers these priors to achieve surface reconstruction. Specifically, we incorporate a feed-forward network to predict per-view pixel-aligned 2DGS, which enables the network to synthesize novel view images and thus eliminates the need for direct 3D ground-truth supervision. To improve the accuracy of 2DGS position and orientation prediction, we propose a Weighted Chamfer Distance Loss to regularize the depth maps, especially in overlapping areas of input views, and also a normal prediction network to align the orientation of 2DGS with normal vectors predicted by a monocular normal estimator. Extensive experiments validate the effectiveness of our proposed improvement, demonstrating that our method achieves state-of-the-art performance in generalizable sparse-view mesh reconstruction tasks. Project Page: https://hanzhichang.github.io/meshsplat_web