AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Hao-Yu Hsu, Zhi-Hao Lin, Albert Zhai, Hongchi Xia, Shenlong Wang

2024-11-05

AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Summary

This paper presents AutoVFX, a new system that allows users to create realistic visual effects in videos simply by giving natural language instructions. It combines advanced technology to make video editing easier and more accessible for everyone.

What's the problem?

Creating visual effects (VFX) in videos is usually a complicated and time-consuming process that requires a lot of skill and expensive software. Many people find it difficult to create the effects they want because they lack the technical knowledge or tools, making it hard for everyday users to produce high-quality videos.

What's the solution?

AutoVFX addresses this problem by allowing users to input a video along with simple instructions in plain language, like 'make the vase explode' or 'add a splash when the boat hits the dock.' The system uses advanced techniques like neural scene modeling and physical simulations to understand the video and apply realistic effects based on the instructions. This means users can create impressive VFX without needing to learn complex software or techniques.

Why it matters?

This research is important because it democratizes video editing by making powerful VFX tools available to everyone, not just professionals. By simplifying the process and enabling creative expression through natural language, AutoVFX can help content creators, educators, and hobbyists produce high-quality videos more easily, enhancing creativity and accessibility in video production.

Abstract

Modern visual effects (VFX) software has made it possible for skilled artists to create imagery of virtually anything. However, the creation process remains laborious, complex, and largely inaccessible to everyday users. In this work, we present AutoVFX, a framework that automatically creates realistic and dynamic VFX videos from a single video and natural language instructions. By carefully integrating neural scene modeling, LLM-based code generation, and physical simulation, AutoVFX is able to provide physically-grounded, photorealistic editing effects that can be controlled directly using natural language instructions. We conduct extensive experiments to validate AutoVFX's efficacy across a diverse spectrum of videos and instructions. Quantitative and qualitative results suggest that AutoVFX outperforms all competing methods by a large margin in generative quality, instruction alignment, editing versatility, and physical plausibility.

View Paper