YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals
Sandeep Mishra, Oindrila Saha, Alan C. Bovik
2024-06-26

Summary
This paper introduces YouDream, a new method for creating realistic 3D models of animals based on text descriptions. It uses advanced technology to ensure that the generated animals are anatomically accurate and visually appealing.
What's the problem?
Creating 3D models of animals can be challenging because traditional methods often rely on either text or images, which limits creativity. Previous approaches sometimes produce unrealistic animals that don't look right, particularly when it comes to their anatomy. This makes it hard to generate high-quality and consistent animal models.
What's the solution?
YouDream solves these problems by using a text-to-image diffusion model that is guided by 2D views of a 3D pose. This means that instead of just relying on words or images, the method uses specific poses to create more accurate 3D representations of animals. Additionally, YouDream includes a fully automated system that adapts existing poses from a limited library, allowing it to generate commonly found animals without needing human input. A study showed that users preferred the animal models created by YouDream compared to those made by previous methods.
Why it matters?
This research is important because it advances the field of 3D generation by providing a way to create anatomically correct and visually consistent animal models. This can be useful in various applications such as video games, animation, and virtual reality, making it easier for creators to produce high-quality digital content.
Abstract
3D generation guided by text-to-image diffusion models enables the creation of visually compelling assets. However previous methods explore generation based on image or text. The boundaries of creativity are limited by what can be expressed through words or the images that can be sourced. We present YouDream, a method to generate high-quality anatomically controllable animals. YouDream is guided using a text-to-image diffusion model controlled by 2D views of a 3D pose prior. Our method generates 3D animals that are not possible to create using previous text-to-3D generative methods. Additionally, our method is capable of preserving anatomic consistency in the generated animals, an area where prior text-to-3D approaches often struggle. Moreover, we design a fully automated pipeline for generating commonly found animals. To circumvent the need for human intervention to create a 3D pose, we propose a multi-agent LLM that adapts poses from a limited library of animal 3D poses to represent the desired animal. A user study conducted on the outcomes of YouDream demonstrates the preference of the animal models generated by our method over others. Turntable results and code are released at https://youdream3d.github.io/