SANA-WM

NEW

Free World Open-Source

LikeWebsite Promote

Key Features

Generates 720p minute-scale video from one image and a camera trajectory.

Uses a 2.6B-parameter Hybrid Linear Diffusion Transformer architecture.

Combines Gated DeltaNet and softmax attention for memory-efficient long-context modeling.

Supports precise 6-DoF camera control through a dual-branch camera-control design.

Applies a two-stage generation pipeline with long-video refinement.

Trains from public video clips with metric-scale camera-pose supervision.

Targets interactive world modeling, embodied AI, and camera-controlled video generation.

Provides public paper, code, and model resources for research use.

The architecture is a 2.6B-parameter open-source world model with a Hybrid Linear Diffusion Transformer. It combines frame-wise Gated DeltaNet and softmax attention for long-context modeling, uses dual-branch camera control for 6-DoF trajectory adherence, and applies a two-stage pipeline with a long-video refiner. These design choices help SANA-WM maintain temporal consistency and visual quality over longer sequences than typical short-form video generators.

SANA-WM is valuable for researchers and developers building explorable AI worlds, robotics simulators, camera-controlled video tools, or data engines for embodied agents. Its efficient training and inference profile makes it notable because it uses public video data with metric-scale pose supervision rather than depending only on massive closed datasets. The release provides paper, code, and model links, so it is listed as a free open-source world-model project.

Get more likes & reach the top of search results by adding this button on your site!

SANA-WM

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter