Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu

2024-09-27

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Summary

This paper talks about Disco4D, a new framework that allows for the generation and animation of 4D human figures from just a single image. It separates the clothing from the body to create more detailed and flexible animations.

What's the problem?

Existing methods for generating animated human figures often struggle to accurately represent both the body and clothing details. Many techniques do not effectively separate these elements, which can lead to less realistic animations and difficulties in editing clothing styles. Additionally, they may not handle parts of the body that are hidden in the original image well.

What's the solution?

Disco4D addresses these issues by using a method called Gaussian Splatting to disentangle clothing from the human body, represented by a model called SMPL-X. This separation allows for better detail and flexibility in generating animations. The framework also uses diffusion models to improve the 3D generation process, helping to fill in details that are not visible in the input image. By learning specific identities for each clothing item, Disco4D can create more accurate and customizable animations.

Why it matters?

This research is important because it enhances how we can create realistic animated characters for various applications, such as video games, movies, and virtual reality. By allowing for detailed customization of clothing and improved animation quality, Disco4D opens up new possibilities for creating lifelike digital humans that can move and interact in dynamic ways.

Abstract

We present Disco4D, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. 1) Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. 2) It adopts diffusion models to enhance the 3D generation process, e.g., modeling occluded parts not visible in the input image. 3) It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks. Our visualizations can be found in https://disco-4d.github.io/.

View Paper