Choreographing a World of Dynamic Objects

Yanzhe Lyu, Chen Geng, Karthik Dharmarajan, Yunzhi Zhang, Hadi Alzayer, Shangzhe Wu, Jiajun Wu

2026-01-08

Choreographing a World of Dynamic Objects

Summary

This paper introduces CHORD, a new way to create realistic animations of objects moving and interacting in 3D space over time. It's a system for generating complex, dynamic scenes, like multiple objects colliding or a robot arm performing a task.

What's the problem?

Traditionally, making these kinds of animations is really hard work. Animators have to manually program rules for how things move, and these rules are often specific to the type of object. More recently, people have tried using artificial intelligence, but that requires huge amounts of video data for each object category, which isn't always available. Basically, existing methods are either time-consuming or limited in what they can create.

What's the solution?

CHORD takes a different approach. It learns from regular 2D videos, but instead of just copying what it sees, it figures out the underlying physics – the forces and motions – that cause those movements. It's like learning the rules of physics by watching things fall, rather than memorizing every single fall. This 'distillation' process allows it to create realistic 4D movements without needing tons of specific training data, making it work for a wide variety of objects.

Why it matters?

This is important because it makes creating realistic animations much easier and more flexible. It could be used in movies, video games, or even to train robots. Because it doesn't rely on specific object types, it's a universal system that can handle almost anything, and it opens the door to generating complex scenes that were previously too difficult to create.

Abstract

Dynamic objects in our physical 4D (3D + time) world are constantly evolving, deforming, and interacting with other objects, leading to diverse 4D scene dynamics. In this paper, we present a universal generative pipeline, CHORD, for CHOReographing Dynamic objects and scenes and synthesizing this type of phenomena. Traditional rule-based graphics pipelines to create these dynamics are based on category-specific heuristics, yet are labor-intensive and not scalable. Recent learning-based methods typically demand large-scale datasets, which may not cover all object categories in interest. Our approach instead inherits the universality from the video generative models by proposing a distillation-based pipeline to extract the rich Lagrangian motion information hidden in the Eulerian representations of 2D videos. Our method is universal, versatile, and category-agnostic. We demonstrate its effectiveness by conducting experiments to generate a diverse range of multi-body 4D dynamics, show its advantage compared to existing methods, and demonstrate its applicability in generating robotics manipulation policies. Project page: https://yanzhelyu.github.io/chord

View Paper