InfiniHuman: Infinite 3D Human Creation with Precise Control

Yuxuan Xue, Xianghui Xie, Margaret Kostyrko, Gerard Pons-Moll

2025-10-14

InfiniHuman: Infinite 3D Human Creation with Precise Control

Summary

This paper introduces a new way to create incredibly diverse and realistic 3D human avatars, along with a huge dataset to support it. It tackles the difficulty of generating these avatars with a wide range of appearances and body types, and provides tools for controlling exactly how they look.

What's the problem?

Creating realistic 3D human models is hard, especially if you want a lot of variety in things like ethnicity, age, and clothing. The biggest issue is that collecting enough real-world data to train the computer models is incredibly expensive and doesn't cover enough different people and styles. Basically, it's tough to get enough good examples for the computer to learn from.

What's the solution?

The researchers developed a system called InfiniHuman that uses existing powerful AI models – ones that understand language and can generate images – to automatically create a massive dataset of 3D humans. This dataset, called InfiniHumanData, includes over 111,000 different people, each described with detailed text, images, and information about their body shape and clothing. They then built another AI model, InfiniHumanGen, that uses this data to quickly generate new avatars that you can control with text descriptions, body shape preferences, and clothing choices.

Why it matters?

This work is important because it makes it much easier and cheaper to create realistic 3D avatars. This has big implications for things like video games, virtual reality, and even creating digital doubles for movies. Because the system can generate an almost unlimited number of avatars, and you have a lot of control over their appearance, it opens up possibilities for more personalized and immersive experiences.

Abstract

Generating realistic and controllable 3D human avatars is a long-standing challenge, particularly when covering broad attribute ranges such as ethnicity, age, clothing styles, and detailed body shapes. Capturing and annotating large-scale human datasets for training generative models is prohibitively expensive and limited in scale and diversity. The central question we address in this paper is: Can existing foundation models be distilled to generate theoretically unbounded, richly annotated 3D human data? We introduce InfiniHuman, a framework that synergistically distills these models to produce richly annotated human data at minimal cost and with theoretically unlimited scalability. We propose InfiniHumanData, a fully automatic pipeline that leverages vision-language and image generation models to create a large-scale multi-modal dataset. User study shows our automatically generated identities are undistinguishable from scan renderings. InfiniHumanData contains 111K identities spanning unprecedented diversity. Each identity is annotated with multi-granularity text descriptions, multi-view RGB images, detailed clothing images, and SMPL body-shape parameters. Building on this dataset, we propose InfiniHumanGen, a diffusion-based generative pipeline conditioned on text, body shape, and clothing assets. InfiniHumanGen enables fast, realistic, and precisely controllable avatar generation. Extensive experiments demonstrate significant improvements over state-of-the-art methods in visual quality, generation speed, and controllability. Our approach enables high-quality avatar generation with fine-grained control at effectively unbounded scale through a practical and affordable solution. We will publicly release the automatic data generation pipeline, the comprehensive InfiniHumanData dataset, and the InfiniHumanGen models at https://yuxuan-xue.com/infini-human.

View Paper