Key Features

Clones diverse camera motions from reference videos to animate source images.
Supports one-shot and multi-shot camera-motion transfer.
Uses a camera grid representation derived from rendered camera poses.
Injects camera grids into MMDiT alongside other control signals.
Uses a prompt expansion agent to integrate multimodal control signals.
Handles dynamic motion, scene generalization, and special camera movement.
Demonstrates coherent transitions and shot relationships across multi-shot videos.
Provides paper, public code link, and many direct demo videos.

The method represents camera motion through a camera grid rendered from reference-video camera poses in an empty 3D space. During training, this camera grid is injected into an MMDiT with other controls, while a hierarchical prompt expansion agent integrates multimodal signals at inference.


OmniDirector is useful for video generation workflows that need to copy cinematic camera language, not just object motion. It can reproduce aerial fly-throughs, descents, dolly zooms, bullet-time effects, and lens-distortion-like camera behavior while preserving generated content.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!