IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
Zhibing Li, Tong Wu, Jing Tan, Mengchen Zhang, Jiaqi Wang, Dahua Lin
2024-12-17

Summary
This paper presents IDArb, a new model that helps extract important geometric and material information from images, allowing for better understanding and representation of objects in 3D space.
What's the problem?
Capturing details about how objects look and behave in different lighting conditions is a significant challenge in computer vision. Traditional methods take a long time to process images and often struggle to accurately separate lighting effects from the actual properties of materials. This makes it difficult to create realistic 3D models from images.
What's the solution?
IDArb introduces a diffusion-based approach that can handle any number of input images, even if they are taken under various lighting conditions. It uses advanced techniques like cross-view attention and a special training strategy to ensure consistency across different views of the same object. Additionally, the researchers created a new dataset called ARB-Objaverse, which contains a large amount of intrinsic data for training the model effectively.
Why it matters?
This work is important because it advances the ability to create realistic 3D representations from images, which has applications in fields like augmented reality, robotics, and visual effects in movies. By improving how we understand and model physical properties from visual data, IDArb could enhance many technologies that rely on accurate 3D modeling.
Abstract
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs, while still struggling with inherent ambiguities between lighting and material. On the other hand, learning-based approaches leverage rich material priors from existing 3D object datasets but face challenges with maintaining multi-view consistency. In this paper, we introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations. Our method achieves accurate and multi-view consistent estimation on surface normals and material properties. This is made possible through a novel cross-view, cross-domain attention module and an illumination-augmented, view-adaptive training strategy. Additionally, we introduce ARB-Objaverse, a new dataset that provides large-scale multi-view intrinsic data and renderings under diverse lighting conditions, supporting robust training. Extensive experiments demonstrate that IDArb outperforms state-of-the-art methods both qualitatively and quantitatively. Moreover, our approach facilitates a range of downstream tasks, including single-image relighting, photometric stereo, and 3D reconstruction, highlighting its broad applications in realistic 3D content creation.