Key Features

Trained on a small but high-fidelity synthetic dataset
Provides strong guarantees for data provenance, usage rights, and user consent
Explicit control on data diversity to address unfairness
Generalizes across a range of benchmark datasets and in-the-wild data
Delivers high-quality, detailed results with remarkable efficiency
Runs orders of magnitude faster than competing methods
Captures a wide range of human characteristics under diverse lighting conditions
Uses a single model architecture for multiple tasks

The SynthHuman dataset used to train DAViD contains 300K images of resolution 384×512, covering examples of faces, upper body, and full body scenarios equally. The dataset is designed to be diverse in terms of poses, environments, lighting, and appearances, and is not tailored to any specific evaluation set. This allows DAViD to generalize across a range of benchmark datasets, as well as on in-the-wild data. Along with the RGB rendered image, each sample includes soft foreground mask, surface normals, and depth ground-truth annotations, used to train the models.


DAViD delivers high-quality, detailed results while achieving remarkable efficiency, running orders of magnitude faster than competing methods. The model reliably captures a wide range of human characteristics under diverse lighting conditions, preserving fine-grained details such as hair strands and subtle facial features. This demonstrates the model's robustness and accuracy in complex, real-world scenarios. DAViD uses a single model architecture to tackle three dense prediction tasks, making it a versatile and efficient solution for various computer vision applications.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!