< Explain other AI papers

Versatile Editing of Video Content, Actions, and Dynamics without Training

Vladimir Kulikov, Roni Paiss, Andrey Voynov, Inbar Mosseri, Tali Dekel, Tomer Michaeli

2026-03-23

Versatile Editing of Video Content, Actions, and Dynamics without Training

Summary

This paper introduces a new method called DynaEdit for editing videos, focusing on making complex changes like altering actions or adding objects that interact with the scene.

What's the problem?

Currently, editing videos in a controlled way is difficult, especially when you want to make significant changes beyond simple adjustments. Existing methods either need a huge amount of training data, which is hard to get, or they can only make edits that preserve the original structure and motion of the video – meaning they can’t really change what’s *happening* in the video or how things interact. Trying to use existing techniques to make these bigger changes results in blurry or shaky videos.

What's the solution?

DynaEdit solves this by using pre-trained models that already understand how videos work. It doesn’t change the inner workings of these models, making it adaptable to different ones. The key is figuring out how to avoid the blurriness and shaking that happen when trying to make complex edits, and the researchers developed new techniques to address these issues, resulting in smoother and more realistic changes.

Why it matters?

This work is important because it allows for much more versatile video editing without needing massive datasets or complex retraining. It opens the door to easily modifying videos in creative ways, like changing what someone is doing, adding new objects that realistically interact with the scene, or even applying global effects, which could be useful for filmmaking, special effects, or even everyday video editing.

Abstract

Controlled video generation has seen drastic improvements in recent years. However, editing actions and dynamic events, or inserting contents that should affect the behaviors of other objects in real-world videos, remains a major challenge. Existing trained models struggle with complex edits, likely due to the difficulty of collecting relevant training data. Similarly, existing training-free methods are inherently restricted to structure- and motion-preserving edits and do not support modification of motion or interactions. Here, we introduce DynaEdit, a training-free editing method that unlocks versatile video editing capabilities with pretrained text-to-video flow models. Our method relies on the recently introduced inversion-free approach, which does not intervene in the model internals, and is thus model-agnostic. We show that naively attempting to adapt this approach to general unconstrained editing results in severe low-frequency misalignment and high-frequency jitter. We explain the sources for these phenomena and introduce novel mechanisms for overcoming them. Through extensive experiments, we show that DynaEdit achieves state-of-the-art results on complex text-based video editing tasks, including modifying actions, inserting objects that interact with the scene, and introducing global effects.