Vista3D: Unravel the 3D Darkside of a Single Image
Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang
2024-09-19

Summary
This paper presents Vista3D, a new framework that allows for quick and accurate 3D generation from a single image, revealing the hidden dimensions of objects.
What's the problem?
Creating a 3D model from just one image is challenging because most methods require multiple views or extensive data to understand the full shape and details of an object. Traditional techniques often struggle with this task, leading to incomplete or inaccurate 3D representations.
What's the solution?
Vista3D addresses this issue by using a two-phase approach. In the first phase, called the coarse phase, it quickly generates a basic 3D shape using a method called Gaussian Splatting. In the second phase, the fine phase, it refines this shape by extracting detailed information through a Signed Distance Function (SDF) and optimizing it for better accuracy. The framework also incorporates advanced techniques to capture both visible and hidden features of objects, ensuring that the generated models are both consistent and diverse.
Why it matters?
This research is important because it significantly improves the ability to create accurate 3D models from single images, which has applications in fields like virtual reality, gaming, and design. By making it easier to generate detailed 3D representations quickly, Vista3D can enhance creativity and efficiency in various industries.
Abstract
We embark on the age-old quest: unveiling the hidden dimensions of objects from mere glimpses of their visible parts. To address this, we present Vista3D, a framework that realizes swift and consistent 3D generation within a mere 5 minutes. At the heart of Vista3D lies a two-phase approach: the coarse phase and the fine phase. In the coarse phase, we rapidly generate initial geometry with Gaussian Splatting from a single image. In the fine phase, we extract a Signed Distance Function (SDF) directly from learned Gaussian Splatting, optimizing it with a differentiable isosurface representation. Furthermore, it elevates the quality of generation by using a disentangled representation with two independent implicit functions to capture both visible and obscured aspects of objects. Additionally, it harmonizes gradients from 2D diffusion prior with 3D-aware diffusion priors by angular diffusion prior composition. Through extensive evaluation, we demonstrate that Vista3D effectively sustains a balance between the consistency and diversity of the generated 3D objects. Demos and code will be available at https://github.com/florinshen/Vista3D.