Key Features

Maintains video consistency across time, viewpoints, and edits.
Uses an RGB context memory for appearance-aware observations.
Uses a depth context memory to preserve geometry-only structure.
Updates and retrieves memory with edit-aware logic after local or global edits.
Supports global style or appearance changes while preserving stable geometry.
Supports local object-level edits while retaining unchanged scene structure.
Fuses mixed-modality memory references to guide generation.
Provides paper, GitHub code, Hugging Face dataset, and direct demo videos.

The method uses disentangled multi-modal context memory with an RGB bank for semantic appearance and a depth bank for geometric structure. Edit-aware memory update and retrieval let the generator propagate new appearance while preserving stable geometry after changes.


PermaVid is useful for video editing systems, scene simulation, and long-horizon generation workflows where edits must remain coherent after the camera moves away and returns. The project provides paper, code, dataset, and direct demo assets.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!