Learnable SMPLify: A Neural Solution for Optimization-Free Human Pose Inverse Kinematics
Yuchen Yang, Linfeng Dong, Wei Wang, Zhihang Zhong, Xiao Sun
2025-08-25
Summary
This paper introduces a faster way to estimate a 3D model of a person's pose and body shape from images, building on a commonly used method called SMPLify.
What's the problem?
The original SMPLify method is very accurate, but it takes a long time to process each image because it relies on repeatedly adjusting the 3D model until it fits the image well. This makes it impractical for real-time applications or processing lots of data. Essentially, it's too slow for many uses.
What's the solution?
The researchers created 'Learnable SMPLify,' which uses a neural network to directly predict the 3D pose and shape instead of using the slow, iterative process of SMPLify. They tackled two main challenges: getting enough good data to train the network and making sure the network works well with different kinds of movements and poses. They did this by cleverly using sequences of images to create training examples and by normalizing the data to help the network learn more effectively.
Why it matters?
This new method is significantly faster – about 200 times faster than the original SMPLify – while still maintaining good accuracy. This makes it much more practical for applications like animation, virtual reality, and analyzing human movement. It can also be used to improve the results of other existing methods, making it a versatile tool for 3D human pose estimation.
Abstract
In 3D human pose and shape estimation, SMPLify remains a robust baseline that solves inverse kinematics (IK) through iterative optimization. However, its high computational cost limits its practicality. Recent advances across domains have shown that replacing iterative optimization with data-driven neural networks can achieve significant runtime improvements without sacrificing accuracy. Motivated by this trend, we propose Learnable SMPLify, a neural framework that replaces the iterative fitting process in SMPLify with a single-pass regression model. The design of our framework targets two core challenges in neural IK: data construction and generalization. To enable effective training, we propose a temporal sampling strategy that constructs initialization-target pairs from sequential frames. To improve generalization across diverse motions and unseen poses, we propose a human-centric normalization scheme and residual learning to narrow the solution space. Learnable SMPLify supports both sequential inference and plug-in post-processing to refine existing image-based estimators. Extensive experiments demonstrate that our method establishes itself as a practical and simple baseline: it achieves nearly 200x faster runtime compared to SMPLify, generalizes well to unseen 3DPW and RICH, and operates in a model-agnostic manner when used as a plug-in tool on LucidAction. The code is available at https://github.com/Charrrrrlie/Learnable-SMPLify.