VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer

Xinyu Liu, Ailing Zeng, Wei Xue, Harry Yang, Wenhan Luo, Qifeng Liu, Yike Guo

2025-02-14

VFX Creator: Animated Visual Effect Generation with Controllable
Diffusion Transformer

Summary

This paper talks about VFX Creator, a new AI tool that makes it easier to create animated visual effects (VFX) for videos using simple text descriptions and reference images. It uses advanced technology to give users control over where and when the effects appear.

What's the problem?

Creating high-quality visual effects for movies and videos is usually complicated and requires expensive software and a lot of effort. While AI has made progress in generating images and videos, it hasn't been very good at making controllable VFX, where users can decide the timing and placement of effects.

What's the solution?

The researchers developed VFX Creator, which uses a Video Diffusion Transformer model to generate dynamic effects based on user input. They also created Open-VFX, a dataset with videos, text descriptions, and tools for controlling the spatial and temporal aspects of effects. The model uses special techniques like mask control for precise placement and timestamp embedding for timing control. This system allows users to create professional-looking effects without needing extensive training or resources.

Why it matters?

This matters because it makes advanced visual effects creation more accessible to filmmakers, artists, and even casual users. By combining traditional VFX methods with AI, VFX Creator simplifies the process while maintaining high quality, potentially revolutionizing how visual effects are produced in the film industry and beyond.

Abstract

Crafting magic and illusions is one of the most thrilling aspects of filmmaking, with visual effects (VFX) serving as the powerhouse behind unforgettable cinematic experiences. While recent advances in generative artificial intelligence have driven progress in generic image and video synthesis, the domain of controllable VFX generation remains relatively underexplored. In this work, we propose a novel paradigm for animated VFX generation as image animation, where dynamic effects are generated from user-friendly textual descriptions and static reference images. Our work makes two primary contributions: (i) Open-VFX, the first high-quality VFX video dataset spanning 15 diverse effect categories, annotated with textual descriptions, instance segmentation masks for spatial conditioning, and start-end timestamps for temporal control. (ii) VFX Creator, a simple yet effective controllable VFX generation framework based on a Video Diffusion Transformer. The model incorporates a spatial and temporal controllable LoRA adapter, requiring minimal training videos. Specifically, a plug-and-play mask control module enables instance-level spatial manipulation, while tokenized start-end motion timestamps embedded in the diffusion process, alongside the text encoder, allow precise temporal control over effect timing and pace. Extensive experiments on the Open-VFX test set demonstrate the superiority of the proposed system in generating realistic and dynamic effects, achieving state-of-the-art performance and generalization ability in both spatial and temporal controllability. Furthermore, we introduce a specialized metric to evaluate the precision of temporal control. By bridging traditional VFX techniques with generative approaches, VFX Creator unlocks new possibilities for efficient and high-quality video effect generation, making advanced VFX accessible to a broader audience.

View Paper