Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

Claire McLean, Makenzie Meendering, Tristan Swartz, Orri Gabbay, Alexandra Olsen, Rachel Jacobs, Nicholas Rosen, Philippe de Bree, Tony Garcia, Gadsden Merrill, Jake Sandakly, Julia Buffalini, Neham Jain, Steven Krenn, Moneish Kumar, Dejan Markovic, Evonne Ng, Fabian Prada, Andrew Saba, Siwei Zhang, Vasu Agrawal, Tim Godisart

2025-10-21

Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

Summary

This paper introduces Embody 3D, a really large collection of data showing how people move in 3D, created by researchers at Meta. It's designed to help build more realistic and responsive virtual avatars.

What's the problem?

Creating virtual avatars that move and behave like real people is incredibly difficult. Existing datasets either don't capture enough realistic movement, or they focus on very simple actions. There was a need for a dataset that included a wide variety of natural human motions, including things like conversations, gestures, and even how people move around each other in a space.

What's the solution?

The researchers collected data from over 400 people using a bunch of cameras to track their full body and hand movements. They recorded people doing all sorts of things – following instructions, making gestures, walking around, and even interacting with each other in simulated real-life scenarios like having a discussion or working together. This resulted in over 54 million frames of 3D motion data, along with audio and text descriptions of what was happening.

Why it matters?

This dataset is a big step forward for virtual reality and avatar technology. By providing a huge amount of realistic motion data, it will allow developers to create avatars that are much more believable and natural, making virtual interactions feel more immersive and engaging. It could improve everything from video games to virtual meetings to how we interact with AI assistants.

Abstract

The Codec Avatars Lab at Meta introduces Embody 3D, a multimodal dataset of 500 individual hours of 3D motion data from 439 participants collected in a multi-camera collection stage, amounting to over 54 million frames of tracked 3D motion. The dataset features a wide range of single-person motion data, including prompted motions, hand gestures, and locomotion; as well as multi-person behavioral and conversational data like discussions, conversations in different emotional states, collaborative activities, and co-living scenarios in an apartment-like space. We provide tracked human motion including hand tracking and body shape, text annotations, and a separate audio track for each participant.

View Paper