< Explain other AI papers

SAM 3D Body: Robust Full-Body Human Mesh Recovery

Xitong Yang, Devansh Kukreja, Don Pinkus, Anushka Sagar, Taosha Fan, Jinhyung Park, Soyong Shin, Jinkun Cao, Jiawei Liu, Nicolas Ugrinovic, Matt Feiszli, Jitendra Malik, Piotr Dollar, Kris Kitani

2026-02-19

SAM 3D Body: Robust Full-Body Human Mesh Recovery

Summary

This paper introduces a new AI model called 3DB, which can create a detailed 3D model of a person's entire body from just a single image. It's designed to be really good at this, even with tricky photos taken in everyday situations.

What's the problem?

Creating accurate 3D models of people from single images is really hard. Existing methods often struggle with different poses, clothing, or just generally 'real-world' photos that aren't perfectly posed in a studio. They also often don't accurately capture the details of hands and feet, and the underlying structure of the human body isn't always represented well in the 3D model.

What's the solution?

The researchers developed 3DB, which uses a special 'encoder-decoder' system to analyze the image and build the 3D model. A key innovation is a new way to represent the human body called MHR, which separates the 'skeleton' from the 'skin' allowing for more realistic and flexible models. They also created a huge dataset of images with detailed annotations, using a combination of manual work and AI-powered tools to ensure the data is diverse and high-quality. Finally, 3DB can be guided by things like 2D keypoints or outlines to help it focus on specific parts of the body.

Why it matters?

This work is important because it significantly improves the accuracy and realism of 3D human models created from single images. This has lots of potential applications, like creating more realistic avatars for games or virtual reality, improving motion capture technology, or even helping with medical imaging and analysis. Making both the model and the new body representation open-source means other researchers can build on this work.

Abstract

We introduce SAM 3D Body (3DB), a promptable model for single-image full-body 3D human mesh recovery (HMR) that demonstrates state-of-the-art performance, with strong generalization and consistent accuracy in diverse in-the-wild conditions. 3DB estimates the human pose of the body, feet, and hands. It is the first model to use a new parametric mesh representation, Momentum Human Rig (MHR), which decouples skeletal structure and surface shape. 3DB employs an encoder-decoder architecture and supports auxiliary prompts, including 2D keypoints and masks, enabling user-guided inference similar to the SAM family of models. We derive high-quality annotations from a multi-stage annotation pipeline that uses various combinations of manual keypoint annotation, differentiable optimization, multi-view geometry, and dense keypoint detection. Our data engine efficiently selects and processes data to ensure data diversity, collecting unusual poses and rare imaging conditions. We present a new evaluation dataset organized by pose and appearance categories, enabling nuanced analysis of model behavior. Our experiments demonstrate superior generalization and substantial improvements over prior methods in both qualitative user preference studies and traditional quantitative analysis. Both 3DB and MHR are open-source.