Key Features

4D consistent human view synthesis
High-fidelity free-viewpoint rendering
Spatio-temporal diffusion model
Skeleton-Plücker conditioning
Sliding iterative denoising
4DGS reconstruction
Real-time novel view rendering
Support for complex clothing and motions

The method introduced by Diffuman4D tackles the challenge of human novel view synthesis from sparse-view videos using a spatio-temporal diffusion model. The model uses skeleton-Plücker conditioning, where the encoded skeleton latents and Plücker coordinates are concatenated with the image latents at input views or the noise latents at target views. The samples across all views and timestamps form a sample grid, which is denoised by the model using a sliding iterative mechanism and then decoded into the target videos.


Diffuman4D addresses the sparse-view challenge by generating 4D-consistent multi-view videos conditioned on the input videos. The generated videos enable high-quality 4DGS reconstructions, allowing free view rendering of humans in motion. The method has been demonstrated to produce high-fidelity results, and has the potential to be used in a variety of applications such as film, video games, and virtual reality. The Diffuman4D approach is a significant step forward in the field of 4D human view synthesis.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!