VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

Baolu Li, Yiming Zhang, Qinghe Wang, Liqian Ma, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu Jia

2025-10-30

VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

Summary

This paper introduces a new AI system called VFXMaster that makes creating visual effects for videos much easier and more flexible.

What's the problem?

Currently, creating visual effects with AI often requires training a separate AI model for *each* effect you want to make. This is a lot of work and uses a lot of computing power. Plus, these models are bad at creating effects they haven't specifically been trained on, meaning they can't handle new or unusual requests very well and don't scale easily.

What's the solution?

VFXMaster solves this by treating effect creation like showing the AI an example. Instead of training a new model for every effect, you simply *show* it a video with the effect you want, and it tries to copy that effect onto a new video. They developed a clever way to focus the AI's attention on just the effect itself, without getting confused by the rest of the video. They also added a way to quickly adapt the AI to brand new effects using just one example video.

Why it matters?

This is important because it makes creating visual effects much more accessible and efficient. It means artists can create a wider range of effects without needing massive amounts of data or computing resources, and it opens the door to generating effects that haven't been seen before. The researchers are even sharing their code and data so others can build on their work.

Abstract

Visual effects (VFX) are crucial to the expressive power of digital media, yet their creation remains a major challenge for generative AI. Prevailing methods often rely on the one-LoRA-per-effect paradigm, which is resource-intensive and fundamentally incapable of generalizing to unseen effects, thus limiting scalability and creation. To address this challenge, we introduce VFXMaster, the first unified, reference-based framework for VFX video generation. It recasts effect generation as an in-context learning task, enabling it to reproduce diverse dynamic effects from a reference video onto target content. In addition, it demonstrates remarkable generalization to unseen effect categories. Specifically, we design an in-context conditioning strategy that prompts the model with a reference example. An in-context attention mask is designed to precisely decouple and inject the essential effect attributes, allowing a single unified model to master the effect imitation without information leakage. In addition, we propose an efficient one-shot effect adaptation mechanism to boost generalization capability on tough unseen effects from a single user-provided video rapidly. Extensive experiments demonstrate that our method effectively imitates various categories of effect information and exhibits outstanding generalization to out-of-domain effects. To foster future research, we will release our code, models, and a comprehensive dataset to the community.

View Paper