The Step1X-3D method employs a hybrid VAE-DiT geometry generator and an SD-XL-based texture synthesis module. The geometry generator produces Truncated Signed Distance Function (TSDF) representations, which are later meshed via marching cubes. The texture synthesis module conditions on the produced geometry and input images to produce view-consistent textures that are then baked onto the mesh. This integrated approach aims to advance the field by simultaneously resolving critical challenges in data quality, geometric precision, and texture fidelity.


Step1X-3D has achieved state-of-the-art performance in 3D generation, exceeding existing open-source methods and achieving competitive quality with proprietary solutions. The framework uniquely bridges 2D and 3D generation paradigms by supporting direct transfer of 2D control techniques to 3D synthesis. By simultaneously advancing data quality, algorithmic fidelity, and reproducibility, Step1X-3D aims to establish new standards for open research in controllable 3D asset generation. The framework has been tested on various datasets and has demonstrated its ability to generate high-quality 3D assets.

Key Features

Hybrid VAE-DiT geometry generator
SD-XL-based texture synthesis module
Watertight TSDF representations
Perceiver-based latent encoding
Sharp edge sampling for detail preservation
View-consistent texture synthesis
Support for 2D control techniques
Open-source release of models and training code

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!