The Step1X-3D method employs a hybrid VAE-DiT geometry generator and an SD-XL-based texture synthesis module. The geometry generator produces Truncated Signed Distance Function (TSDF) representations, which are later meshed via marching cubes. The texture synthesis module conditions on the produced geometry and input images to produce view-consistent textures that are then baked onto the mesh. This integrated approach aims to advance the field by simultaneously resolving critical challenges in data quality, geometric precision, and texture fidelity.
Step1X-3D has achieved state-of-the-art performance in 3D generation, exceeding existing open-source methods and achieving competitive quality with proprietary solutions. The framework uniquely bridges 2D and 3D generation paradigms by supporting direct transfer of 2D control techniques to 3D synthesis. By simultaneously advancing data quality, algorithmic fidelity, and reproducibility, Step1X-3D aims to establish new standards for open research in controllable 3D asset generation. The framework has been tested on various datasets and has demonstrated its ability to generate high-quality 3D assets.