PlacidDreamer: Advancing Harmony in Text-to-3D Generation
Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia
2024-07-22

Summary
This paper discusses PlacidDreamer, a new framework designed to improve the process of generating 3D images from text descriptions. It aims to create high-quality 3D models that are consistent and realistic by addressing some common issues found in previous methods.
What's the problem?
Generating 3D models from text can be challenging because existing methods often struggle with two main problems: they may conflict in their generation goals, leading to inconsistencies, and they can suffer from over-saturation during the refinement process, which affects the quality of the final output. These limitations make it hard to produce accurate and visually appealing 3D representations.
What's the solution?
PlacidDreamer tackles these issues by using a single multi-view diffusion model that harmonizes the initialization of 3D shapes, multi-view generation, and text-based generation. It introduces a new component called the Latent-Plane module to streamline the generation process and improve geometry reconstruction. Additionally, it employs a Balanced Score Distillation algorithm to manage saturation levels effectively, ensuring that the generated images are detailed and well-balanced. This approach allows PlacidDreamer to produce better quality outputs than previous methods.
Why it matters?
This research is significant because it enhances the capabilities of text-to-3D generation, making it easier to create realistic models for various applications such as video games, virtual reality, and design. By improving the quality and consistency of generated 3D images, PlacidDreamer can help artists and developers create more immersive experiences.
Abstract
Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer. The code is available at https://github.com/HansenHuang0823/PlacidDreamer.