Key Features

Scalable 3D generation framework based on sparse volumes
Spatial Sparse Attention (SSA) mechanism for efficient DiT computations
Variational autoencoder (VAE) for consistent sparse volumetric format
Superior output quality with reduced training costs
3.9× speed-up in the forward pass and 9.6× speed-up in the backward pass
Enables training at 1024³ resolution with just 8 GPUs
Suitable for various applications in computer vision, graphics, and robotics
Real-time applications, such as 3D reconstruction and tracking

The framework includes a variational autoencoder (VAE) that maintains a consistent sparse volumetric format across input, latent, and output stages. This unified design markedly improves training efficiency and stability. Direct3D-S2 is trained on publicly available datasets and surpasses state-of-the-art methods in generation quality and efficiency. It enables training at 1024³ resolution with just 8 GPUs, making gigascale 3D generation both practical and accessible.


Direct3D-S2 has a wide range of applications in computer vision, graphics, and robotics. Its ability to generate high-resolution 3D shapes using volumetric representations makes it a valuable tool for various industries, such as architecture, product design, and video production. The model's efficiency and scalability also make it suitable for use in real-time applications, such as 3D reconstruction and tracking.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!