In-Context Sync-LoRA for Portrait Video Editing
Sagi Polaczek, Or Patashnik, Ali Mahdavi-Amiri, Daniel Cohen-Or
2025-12-03
Summary
This paper introduces a new technique called Sync-LoRA for editing portrait videos, focusing on making changes to a person's appearance or the video itself while ensuring the video still looks natural and the movements aren't jerky or out of sync.
What's the problem?
When you try to edit a video, especially portraits, it's really hard to change things without messing up how the person moves or looks from frame to frame. If you alter an image in a video, you need to make sure it still matches the original timing and natural movements, otherwise it looks fake and unnatural. The main challenge is keeping everything synchronized – the edits need to flow seamlessly with the original video's timing and the person's identity needs to remain consistent throughout.
What's the solution?
Sync-LoRA solves this by using a type of AI model that's good at creating videos from images. The idea is to edit just the first frame of the video, telling the AI what changes you want. Then, the AI applies those changes to all the other frames. To make sure everything stays in sync, they trained the AI using pairs of videos showing the *same* movements but with different appearances. This teaches the AI to understand how people move and apply edits without disrupting that motion. They carefully selected these video pairs to ensure they were perfectly aligned in time.
Why it matters?
This research is important because it makes video editing much easier and more realistic. It allows for high-quality edits, like changing someone's hairstyle or adding accessories, without the frustrating problem of jerky movements or unnatural appearances. This could be useful for creating special effects, personalizing videos, or even in applications like virtual meetings where you might want to subtly alter your appearance.
Abstract
Editing portrait videos is a challenging task that requires flexible yet precise control over a wide range of modifications, such as appearance changes, expression edits, or the addition of objects. The key difficulty lies in preserving the subject's original temporal behavior, demanding that every edited frame remains precisely synchronized with the corresponding source frame. We present Sync-LoRA, a method for editing portrait videos that achieves high-quality visual modifications while maintaining frame-accurate synchronization and identity consistency. Our approach uses an image-to-video diffusion model, where the edit is defined by modifying the first frame and then propagated to the entire sequence. To enable accurate synchronization, we train an in-context LoRA using paired videos that depict identical motion trajectories but differ in appearance. These pairs are automatically generated and curated through a synchronization-based filtering process that selects only the most temporally aligned examples for training. This training setup teaches the model to combine motion cues from the source video with the visual changes introduced in the edited first frame. Trained on a compact, highly curated set of synchronized human portraits, Sync-LoRA generalizes to unseen identities and diverse edits (e.g., modifying appearance, adding objects, or changing backgrounds), robustly handling variations in pose and expression. Our results demonstrate high visual fidelity and strong temporal coherence, achieving a robust balance between edit fidelity and precise motion preservation.