Expressive Whole-Body 3D Gaussian Avatar

Gyeongsik Moon, Takaaki Shiratori, Shunsuke Saito

2024-08-01

Summary

This paper introduces ExAvatar, a new type of 3D human avatar that can express emotions through facial expressions and hand movements, created from just a short video. It combines advanced modeling techniques to make the avatar more realistic and expressive.

What's the problem?

Most existing 3D avatars only show body movements and lack the ability to display facial expressions and hand motions. This limits their usefulness in applications like virtual reality or gaming, where conveying emotions is important. Additionally, creating realistic avatars can be challenging when the source video doesn't provide enough variety in expressions or poses.

What's the solution?

To solve these issues, the authors developed ExAvatar, which uses a combination of a whole-body mesh model (SMPL-X) and a technique called 3D Gaussian Splatting (3DGS). They designed a hybrid representation that allows the avatar to animate with new facial expressions and poses even if the original video had limited options. By treating each part of the avatar's surface as a point in 3D space, they reduced errors in how the avatar moves and appears when expressing different emotions.

Why it matters?

This research is significant because it enhances the realism and expressiveness of 3D avatars, making them more suitable for various applications such as video games, movies, and virtual interactions. By improving how avatars can convey emotions through facial expressions and gestures, ExAvatar can lead to more engaging and lifelike experiences in digital environments.

Abstract

Facial expression and hand motions are necessary to express our emotions and interact with the world. Nevertheless, most of the 3D human avatars modeled from a casually captured video only support body motions without facial expressions and hand motions.In this work, we present ExAvatar, an expressive whole-body 3D human avatar learned from a short monocular video. We design ExAvatar as a combination of the whole-body parametric mesh model (SMPL-X) and 3D Gaussian Splatting (3DGS). The main challenges are 1) a limited diversity of facial expressions and poses in the video and 2) the absence of 3D observations, such as 3D scans and RGBD images. The limited diversity in the video makes animations with novel facial expressions and poses non-trivial. In addition, the absence of 3D observations could cause significant ambiguity in human parts that are not observed in the video, which can result in noticeable artifacts under novel motions. To address them, we introduce our hybrid representation of the mesh and 3D Gaussians. Our hybrid representation treats each 3D Gaussian as a vertex on the surface with pre-defined connectivity information (i.e., triangle faces) between them following the mesh topology of SMPL-X. It makes our ExAvatar animatable with novel facial expressions by driven by the facial expression space of SMPL-X. In addition, by using connectivity-based regularizers, we significantly reduce artifacts in novel facial expressions and poses.

View Paper