Key Features

Part-level human motion generation and composition
Fine-grained control over individual body parts
Global semantic coherence
Hierarchical Frankenstein dataset
Atomic, temporally-aware part-level text annotations
Transformer-based diffusion model
Input conditioning on sequence-level, action-level, and part-level prompts
Ability to compose motions unseen during training

The FrankenMotion model is a transformer-based diffusion model that can be input conditioned on sequence-level, action-level, and part-level prompts. After training with paired data of motion and structured multi-granularity text annotations, it learns the essential motion elements and how to compose them into complex motions. The model outperforms previous baseline models adapted and retrained for the same setting, and can compose motions unseen during training.


The Frankenstein dataset is the largest dataset providing hierarchical, temporally-aware annotations for 3D human motion, featuring high-quality, diverse motion annotations generated automatically using the FrankenAgent. The dataset captures sequence-level, action-level, and part-level information, enabling the model to learn and generate complex motions with both spatial and temporal control. Ablation studies highlight the importance of hierarchical conditioning, demonstrating the degradation of motion quality as conditioning layers are removed.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!