IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Yuanhang Li, Yiren Song, Junzhe Bai, Xinran Liang, Hu Yang, Libiao Jin, Qi Mao

2025-12-18

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Summary

This paper introduces a new system called IC-Effect that lets you easily add cool visual effects to videos, like flames or cartoon characters, even if you only have a few examples to learn from.

What's the problem?

Adding visual effects to videos is really hard because the effects need to look like they actually belong in the scene, meaning the background shouldn't change and the effect needs to move naturally over time. Existing computer programs struggle with this – they either mess up the background, or the effects look glitchy and don't fit well. It's also difficult to train these programs because you need a lot of example videos showing the effects already added, which is time-consuming to create.

What's the solution?

IC-Effect solves this by using a clever technique that focuses on understanding the original video as a guide. It uses a type of model called DiT, which is good at learning from context, to make sure the background stays exactly the same while seamlessly adding the desired effect. They also developed a two-step training process and a way to make the process more efficient by only focusing on the important parts of the video. Finally, they created a new collection of videos specifically for training and testing these kinds of effects.

Why it matters?

This research is important because it makes creating high-quality video effects much easier and more accessible. It opens up possibilities for anyone to create professional-looking videos without needing to be a special effects expert or having access to expensive software and large datasets. It could be used for filmmaking, content creation, or even just for fun.

Abstract

We propose IC-Effect, an instruction-guided, DiT-based framework for few-shot video VFX editing that synthesizes complex effects (\eg flames, particles and cartoon characters) while strictly preserving spatial and temporal consistency. Video VFX editing is highly challenging because injected effects must blend seamlessly with the background, the background must remain entirely unchanged, and effect patterns must be learned efficiently from limited paired data. However, existing video editing models fail to satisfy these requirements. IC-Effect leverages the source video as clean contextual conditions, exploiting the contextual learning capability of DiT models to achieve precise background preservation and natural effect injection. A two-stage training strategy, consisting of general editing adaptation followed by effect-specific learning via Effect-LoRA, ensures strong instruction following and robust effect modeling. To further improve efficiency, we introduce spatiotemporal sparse tokenization, enabling high fidelity with substantially reduced computation. We also release a paired VFX editing dataset spanning 15 high-quality visual styles. Extensive experiments show that IC-Effect delivers high-quality, controllable, and temporally consistent VFX editing, opening new possibilities for video creation.

View Paper