Real3D: Scaling Up Large Reconstruction Models with Real-World Images

Hanwen Jiang, Qixing Huang, Georgios Pavlakos

2024-06-14

Real3D: Scaling Up Large Reconstruction Models with Real-World Images

Summary

This paper introduces Real3D, a new system designed to improve how large reconstruction models (LRMs) create 3D images from single 2D pictures. It allows these models to learn from real-world images instead of just relying on synthetic data or multiple views of the same object.

What's the problem?

Traditionally, training LRMs has depended on large datasets made up of synthetic 3D models or multi-view images, which can be difficult to scale and may not accurately reflect the variety of shapes found in the real world. This means that the models might not perform well when faced with real-world objects that they haven't seen before.

What's the solution?

To solve this problem, the authors developed Real3D, which uses a self-training method that combines both synthetic data and single-view real images. They introduced two new unsupervised training techniques: one that focuses on pixel-level details and another that looks at the overall shape and structure of objects. Additionally, they created an automatic system to gather high-quality real-world images to enhance the training process. This approach allows the model to learn more effectively from diverse examples.

Why it matters?

This research is important because it enables better training of AI models that can create accurate 3D representations from everyday images. By using real-world data, Real3D can help improve applications in areas like virtual reality, gaming, and design, making these technologies more realistic and useful.

Abstract

The default strategy for training single-view Large Reconstruction Models (LRMs) follows the fully supervised route using large-scale datasets of synthetic 3D assets or multi-view captures. Although these resources simplify the training procedure, they are hard to scale up beyond the existing datasets and they are not necessarily representative of the real distribution of object shapes. To address these limitations, in this paper, we introduce Real3D, the first LRM system that can be trained using single-view real-world images. Real3D introduces a novel self-training framework that can benefit from both the existing synthetic data and diverse single-view real images. We propose two unsupervised losses that allow us to supervise LRMs at the pixel- and semantic-level, even for training examples without ground-truth 3D or novel views. To further improve performance and scale up the image data, we develop an automatic data curation approach to collect high-quality examples from in-the-wild images. Our experiments show that Real3D consistently outperforms prior work in four diverse evaluation settings that include real and synthetic data, as well as both in-domain and out-of-domain shapes. Code and model can be found here: https://hwjiang1510.github.io/Real3D/

View Paper