Key Features

4D geometric control
Controllable video world model
Flexible control modes
Interactive 4D control interface
Geometry-consistent video generation
Multi-view world dynamics
Robust control across dynamic and static scenes
Sharp, geometrically coherent videos

The framework of VerseCrafter consists of a frozen Wan2.1 backbone and a lightweight GeoAdapter that encodes the rendered 4D control maps and injects them into selected diffusion blocks. This design enables precise camera and multi-object motion control while maintaining sharp, geometrically coherent videos. The model is trained on the VerseControl4D dataset, which contains 35,000 training clips and 1,000 validation/test clips with complete geometric supervision.


VerseCrafter offers flexible 4D geometric control, allowing users to specify camera-only, object-only, or joint control modes. The model also features an interactive 4D control interface, where users can design custom camera trajectories and 3D Gaussian object trajectories within Blender. The resulting trajectories are exported as control maps and used by VerseCrafter for geometry-consistent, controllable video generation. The model produces consistent multi-view world dynamics with aligned camera and object motions.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!