MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors
Yehonathan Litman, Or Patashnik, Kangle Deng, Aviral Agrawal, Rushikesh Zawar, Fernando De la Torre, Shubham Tulsiani
2024-09-24

Summary
This paper discusses the limitations of using large language models (LLMs) as judges for evaluating AI alignment. It introduces a new benchmark called SOS-Bench to better measure how well these models align with human values and safety.
What's the problem?
As LLMs like ChatGPT have become popular, many methods have been developed to improve their alignment with human preferences. However, there is a concern that the preferences judged by LLMs do not always translate into real improvements in safety, knowledge, or following instructions. Additionally, LLM judges tend to favor stylistic elements (like how friendly the text sounds) over important factors such as factual accuracy and safety, which can lead to misleading evaluations.
What's the solution?
To address these issues, the researchers created SOS-Bench, a large and standardized benchmark for evaluating LLM alignment. They found that LLM judges' preferences do not correlate well with concrete measures of safety and instruction following. The study showed that the supervised fine-tuning stage of model training has a greater impact on alignment than preference optimization methods. The researchers also highlighted the importance of using diverse prompts and scaling data during training to improve model performance.
Why it matters?
This research is important because it helps clarify how we should evaluate AI models to ensure they are safe and effective. By introducing SOS-Bench, the study provides a better way to assess whether LLMs truly align with human values, which is crucial for developing reliable AI systems that can be trusted in real-world applications.
Abstract
Recent works in inverse rendering have shown promise in using multi-view images of an object to recover shape, albedo, and materials. However, the recovered components often fail to render accurately under new lighting conditions due to the intrinsic challenge of disentangling albedo and material properties from input images. To address this challenge, we introduce MaterialFusion, an enhanced conventional 3D inverse rendering pipeline that incorporates a 2D prior on texture and material properties. We present StableMaterial, a 2D diffusion model prior that refines multi-lit data to estimate the most likely albedo and material from given input appearances. This model is trained on albedo, material, and relit image data derived from a curated dataset of approximately ~12K artist-designed synthetic Blender objects called BlenderVault. we incorporate this diffusion prior with an inverse rendering framework where we use score distillation sampling (SDS) to guide the optimization of the albedo and materials, improving relighting performance in comparison with previous work. We validate MaterialFusion's relighting performance on 4 datasets of synthetic and real objects under diverse illumination conditions, showing our diffusion-aided approach significantly improves the appearance of reconstructed objects under novel lighting conditions. We intend to publicly release our BlenderVault dataset to support further research in this field.