PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, Shenlong Wang

2024-09-30

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Summary

This paper presents PhysGen, a new method that creates realistic videos from a single image by using physics-based simulations to show how objects would move in real life.

What's the problem?

Generating videos from images is challenging because existing methods often produce unrealistic or inconsistent results. Many tools can't accurately simulate how objects behave physically, which is important for creating believable animations.

What's the solution?

PhysGen combines three main components: an image understanding module that analyzes the image's details, a dynamics simulation model that uses physics to predict how objects would move, and a rendering module that creates the final video. This approach allows PhysGen to generate videos that look and behave realistically based on the input conditions, like forces acting on the objects in the image.

Why it matters?

This research matters because it improves how we can create animations and videos from static images, making it useful for various applications such as gaming, education, and scientific visualization. By producing high-quality, physics-grounded videos, PhysGen opens up new possibilities for interactive media and storytelling.

Abstract

We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (e.g., force and torque applied to an object in the image) to produce a realistic, physically plausible, and temporally consistent video. Our key insight is to integrate model-based physical simulation with a data-driven video generation process, enabling plausible image-space dynamics. At the heart of our system are three core components: (i) an image understanding module that effectively captures the geometry, materials, and physical parameters of the image; (ii) an image-space dynamics simulation model that utilizes rigid-body physics and inferred parameters to simulate realistic behaviors; and (iii) an image-based rendering and refinement module that leverages generative video diffusion to produce realistic video footage featuring the simulated motion. The resulting videos are realistic in both physics and appearance and are even precisely controllable, showcasing superior results over existing data-driven image-to-video generation works through quantitative comparison and comprehensive user study. PhysGen's resulting videos can be used for various downstream applications, such as turning an image into a realistic animation or allowing users to interact with the image and create various dynamics. Project page: https://stevenlsw.github.io/physgen/

View Paper