DreamPolish: Domain Score Distillation With Progressive Geometry Generation

Yean Cheng, Ziqi Cai, Ming Ding, Wendi Zheng, Shiyu Huang, Yuxiao Dong, Jie Tang, Boxin Shi

2024-11-06

DreamPolish: Domain Score Distillation With Progressive Geometry Generation

Summary

This paper introduces DreamPolish, a model that generates high-quality 3D objects from text descriptions by improving how geometry and textures are created.

What's the problem?

Current methods for turning text into 3D images often produce low-quality shapes and textures that don't look realistic or consistent. This can be frustrating for users who want accurate representations of objects based on their descriptions.

What's the solution?

The authors developed DreamPolish, which uses a two-step process to create better 3D models. First, it constructs the basic shape of an object using various neural networks to ensure stability and reduce errors. Then, it refines these shapes and adds textures by guiding the model to focus on realistic details, using a new method called domain score distillation to improve texture quality. This approach helps the model learn from examples to create more lifelike and detailed 3D objects.

Why it matters?

This research is important because it enhances the ability of AI to create realistic 3D models from simple text prompts. Better 3D generation can be used in video games, movies, virtual reality, and design, making it easier for creators to bring their ideas to life in a visually appealing way.

Abstract

We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures. In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process. Instead of relying solely on a view-conditioned diffusion prior in the novel sampled views, which often leads to undesired artifacts in the geometric surface, we incorporate an additional normal estimator to polish the geometry details, conditioned on viewpoints with varying field-of-views. We propose to add a surface polishing stage with only a few training steps, which can effectively refine the artifacts attributed to limited guidance from previous stages and produce 3D objects with more desirable geometry. The key topic of texture generation using pretrained text-to-image models is to find a suitable domain in the vast latent distribution of these models that contains photorealistic and consistent renderings. In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain. We draw inspiration from the classifier-free guidance (CFG) in textconditioned image generation tasks and show that CFG and variational distribution guidance represent distinct aspects in gradient guidance and are both imperative domains for the enhancement of texture quality. Extensive experiments show our proposed model can produce 3D assets with polished surfaces and photorealistic textures, outperforming existing state-of-the-art methods.

View Paper