VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads

Orest Kupyn, Eugene Khvedchenia, Christian Rupprecht

2024-08-09

VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads

Summary

This paper introduces VGGHeads, a large synthetic dataset designed to help improve the detection and 3D modeling of human heads in images.

What's the problem?

Detecting human heads and creating accurate 3D models from images is important for many applications, such as virtual reality and facial recognition. However, traditional datasets used for training models often have issues like bias, privacy concerns, and limited variety because they are collected in controlled environments. This makes it hard for models to perform well in real-world situations.

What's the solution?

To solve these problems, the authors created VGGHeads, which includes over 1 million high-resolution images of human heads generated using advanced techniques called diffusion models. Each image is annotated with detailed information, including 3D head shapes and key facial points. They also developed a new model that can detect heads and create 3D meshes from a single image at the same time. Their experiments showed that models trained on this synthetic dataset perform well on real images.

Why it matters?

This research is significant because it provides a high-quality resource for training AI systems to understand and model human heads better. By using synthetic data, VGGHeads helps overcome the limitations of traditional datasets, allowing for more accurate applications in fields like gaming, film, and security technology.

Abstract

Human head detection, keypoint estimation, and 3D head model fitting are important tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce VGGHeads -- a large scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset we introduce a new model architecture capable of simultaneous heads detection and head meshes reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads. Additionally, we provide detailed information about the synthetic data generation pipeline, enabling it to be re-used for other tasks and domains.

View Paper