The method uses disentangled multi-modal context memory with an RGB bank for semantic appearance and a depth bank for geometric structure. Edit-aware memory update and retrieval let the generator propagate new appearance while preserving stable geometry after changes.
PermaVid is useful for video editing systems, scene simulation, and long-horizon generation workflows where edits must remain coherent after the camera moves away and returns. The project provides paper, code, dataset, and direct demo assets.


