GR-Dexter Technical Report

Ruoshi Wen, Guangzeng Chen, Zhongren Cui, Min Du, Yang Gou, Zhigang Han, Liqun Huang, Mingyu Lei, Yunfei Li, Zhuohang Li, Wenlei Liu, Yuxiao Liu, Xiao Ma, Hao Niu, Yutao Ouyang, Zeyu Ren, Haixin Shi, Wei Xu, Haoxiang Zhang, Jiajun Zhang, Xiao Zhang, Liwei Zheng

2026-01-01

Summary

This research introduces GR-Dexter, a complete system designed to help robots with two hands perform complex tasks based on language instructions.

What's the problem?

Current robots are pretty good at simple grabbing actions, but struggle with more complicated tasks needing both hands and flexible wrists. This is because controlling robots with many moving parts is hard, they often can't 'see' what they're doing because their hands block the view, and getting enough real-world practice data for these robots is expensive and time-consuming.

What's the solution?

The researchers built a new, smaller robotic hand with 21 moving parts. They also created a way for humans to easily control the robot remotely to collect useful training data. Finally, they combined this new data with existing datasets of images, language, and robot movements to train the robot to understand instructions and perform tasks. This training process uses both human demonstrations and pre-existing information to make the robot more capable.

Why it matters?

This work is a step towards creating robots that can genuinely help people with everyday tasks requiring dexterity, like preparing food or assembling objects. By making bimanual robot manipulation more reliable and adaptable to new situations, GR-Dexter brings us closer to having robots that can handle a wider range of real-world challenges.

Abstract

Vision-language-action (VLA) models have enabled language-conditioned, long-horizon robot manipulation, but most existing systems are limited to grippers. Scaling VLA policies to bimanual robots with high degree-of-freedom (DoF) dexterous hands remains challenging due to the expanded action space, frequent hand-object occlusions, and the cost of collecting real-robot data. We present GR-Dexter, a holistic hardware-model-data framework for VLA-based generalist manipulation on a bimanual dexterous-hand robot. Our approach combines the design of a compact 21-DoF robotic hand, an intuitive bimanual teleoperation system for real-robot data collection, and a training recipe that leverages teleoperated robot trajectories together with large-scale vision-language and carefully curated cross-embodiment datasets. Across real-world evaluations spanning long-horizon everyday manipulation and generalizable pick-and-place, GR-Dexter achieves strong in-domain performance and improved robustness to unseen objects and unseen instructions. We hope GR-Dexter serves as a practical step toward generalist dexterous-hand robotic manipulation.

View Paper