MolmoAct: Action Reasoning Models that can Reason in Space

Jason Lee, Jiafei Duan, Haoquan Fang, Yuquan Deng, Shuo Liu, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang, Sangho Lee, Winson Han, Wilbert Pumacay, Angelica Wu, Rose Hendrix, Karen Farley, Eli VanderBilt, Ali Farhadi, Dieter Fox, Ranjay Krishna

2025-08-12

MolmoAct: Action Reasoning Models that can Reason in Space

Summary

This paper talks about MolmoAct, a new kind of AI model called an Action Reasoning Model that helps robots understand their surroundings in three dimensions, plan their movements, and control their actions in a smart and clear way.

What's the problem?

The problem is that many robots try to quickly turn what they see and hear into actions without really thinking ahead or understanding the full 3D space around them. This makes it hard for robots to adapt to new tasks, explain what they are doing, or perform complex movements well.

What's the solution?

The researchers created MolmoAct, which works by breaking down the robot's process into three parts: first, it interprets what the robot sees with a special 3D-aware method; then, it plans a path for the robot to follow using clear waypoints in space; finally, it converts that plan into specific commands the robot can carry out. This approach helps the robot think before moving and makes its behavior easier to understand and control.

Why it matters?

This matters because robots that can think and plan in 3D space will be better at doing complicated tasks like assembling things or moving through difficult places. MolmoAct makes robots more flexible, trustworthy, and easier to work with, which is important for using robots safely and effectively in everyday life and advanced jobs.

Abstract

Action Reasoning Models (ARMs) integrate perception, planning, and control to enable adaptable and explainable robotic behavior, achieving superior performance across various tasks and settings.

View Paper