3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Xiao Fu, Xian Liu, Xintao Wang, Sida Peng, Menghan Xia, Xiaoyu Shi, Ziyang Yuan, Pengfei Wan, Di Zhang, Dahua Lin

2024-12-11

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Summary

This paper talks about 3DTrajMaster, a new system that allows for precise control of multiple moving objects in 3D space when generating videos from text descriptions.

What's the problem?

Previous methods for creating videos often used 2D signals to control how objects move, which limited their ability to accurately represent the three-dimensional nature of motion. This made it difficult to create realistic videos with multiple entities interacting in a 3D environment.

What's the solution?

The authors developed 3DTrajMaster, which uses a more advanced approach by allowing users to specify the exact 3D positions and rotations of objects (called 6DoF pose sequences). This system includes a special component that merges these 3D motions with the video generation process. They also created a new dataset to train the model, which helps it understand how different entities move in three dimensions. Overall, this allows for more realistic and dynamic video generation where multiple characters or objects can move independently and interact naturally.

Why it matters?

This research is important because it improves how AI can create videos, making them more lifelike and engaging. By allowing precise control over movements in 3D space, 3DTrajMaster opens up new possibilities for applications in animation, gaming, and virtual reality, where realistic motion is crucial for storytelling and user experience.

Abstract

This paper aims to manipulate multi-entity 3D motions in video generation. Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions and have achieved remarkable synthesis results. However, 2D control signals are inherently limited in expressing the 3D nature of object motions. To overcome this problem, we introduce 3DTrajMaster, a robust controller that regulates multi-entity dynamics in 3D space, given user-desired 6DoF pose (location and rotation) sequences of entities. At the core of our approach is a plug-and-play 3D-motion grounded object injector that fuses multiple input entities with their respective 3D trajectories through a gated self-attention mechanism. In addition, we exploit an injector architecture to preserve the video diffusion prior, which is crucial for generalization ability. To mitigate video quality degradation, we introduce a domain adaptor during training and employ an annealed sampling strategy during inference. To address the lack of suitable training data, we construct a 360-Motion Dataset, which first correlates collected 3D human and animal assets with GPT-generated trajectory and then captures their motion with 12 evenly-surround cameras on diverse 3D UE platforms. Extensive experiments show that 3DTrajMaster sets a new state-of-the-art in both accuracy and generalization for controlling multi-entity 3D motions. Project page: http://fuxiao0719.github.io/projects/3dtrajmaster

View Paper