CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu

2024-07-11

CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Summary

This paper introduces CrowdMoGen, a new framework that generates realistic crowd movements based on text descriptions. It allows for flexible planning and motion generation without needing specific training data, making it useful for various applications like animation and urban planning.

What's the problem?

Current methods for generating crowd motions often focus on individual actions rather than how groups of people interact. Many existing models require predefined scenarios and are limited in the number of interactions they can handle, which makes them less practical for real-world situations where crowd dynamics are complex.

What's the solution?

CrowdMoGen uses a two-part system to address these issues. The first part, called the Crowd Scene Planner, interprets text descriptions to plan how a crowd should move based on the context of the scene. The second part, the Collective Motion Generator, creates the actual movements of individuals in the crowd according to this plan. This approach allows the framework to generate realistic crowd motions without needing specific examples for every situation.

Why it matters?

This research is important because it enhances how we can simulate and visualize crowd behaviors in various fields, including entertainment and urban planning. By enabling more adaptable and realistic crowd motion generation, CrowdMoGen can improve the quality of animations in games and films, as well as contribute to better designs in public spaces.

Abstract

Crowd Motion Generation is essential in entertainment industries such as animation and games as well as in strategic fields like urban simulation and planning. This new task requires an intricate integration of control and generation to realistically synthesize crowd dynamics under specific spatial and semantic constraints, whose challenges are yet to be fully explored. On the one hand, existing human motion generation models typically focus on individual behaviors, neglecting the complexities of collective behaviors. On the other hand, recent methods for multi-person motion generation depend heavily on pre-defined scenarios and are limited to a fixed, small number of inter-person interactions, thus hampering their practicality. To overcome these challenges, we introduce CrowdMoGen, a zero-shot text-driven framework that harnesses the power of Large Language Model (LLM) to incorporate the collective intelligence into the motion generation framework as guidance, thereby enabling generalizable planning and generation of crowd motions without paired training data. Our framework consists of two key components: 1) Crowd Scene Planner that learns to coordinate motions and dynamics according to specific scene contexts or introduced perturbations, and 2) Collective Motion Generator that efficiently synthesizes the required collective motions based on the holistic plans. Extensive quantitative and qualitative experiments have validated the effectiveness of our framework, which not only fills a critical gap by providing scalable and generalizable solutions for Crowd Motion Generation task but also achieves high levels of realism and flexibility.

View Paper