RynnBrain: Open Embodied Foundation Models

Ronghao Dang, Jiayan Guo, Bohan Hou, Sicong Leng, Kehan Li, Xin Li, Jiangpin Liu, Yunxuan Mao, Zhikai Wang, Yuqian Yuan, Minghao Zhu, Xiao Lin, Yang Bai, Qian Jiang, Yaxi Zhao, Minghua Zeng, Junlong Gao, Yuming Jiang, Jun Cen, Siteng Huang, Liuyi Wang, Wenqiao Zhang

2026-02-19

RynnBrain: Open Embodied Foundation Models

Summary

This paper introduces RynnBrain, a new artificial intelligence model designed to help robots and AI systems better understand and interact with the physical world around them.

What's the problem?

Current AI models, even those that can process different types of information like images and text, often struggle to connect that information to real-world physics and spatial understanding. They lack a central 'brain' that can consistently interpret what an agent 'sees', figure out where things are in space and time, reason about how things work physically, and then plan actions accordingly. Essentially, they aren't very good at 'embodied intelligence' – acting intelligently in a physical body.

What's the solution?

The researchers created RynnBrain, a model that combines perception, reasoning, and planning into one system. It comes in different sizes (2B, 8B, and 30B parameters) and has specialized versions for tasks like navigation, planning, and complex spatial problem-solving. They trained it on a lot of data and then tested it on 20 different tasks that require physical understanding and 8 general vision tasks, showing it performs significantly better than existing AI models in these areas.

Why it matters?

RynnBrain is important because it represents a step towards creating AI that can truly understand and interact with the physical world. This could lead to more capable robots, better virtual reality experiences, and AI systems that can solve real-world problems that require physical reasoning and planning, like navigating a warehouse or assisting in a disaster relief effort.

Abstract

Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal foundation model for embodied intelligence. RynnBrain strengthens four core capabilities in a unified framework: comprehensive egocentric understanding, diverse spatiotemporal localization, physically grounded reasoning, and physics-aware planning. The RynnBrain family comprises three foundation model scales (2B, 8B, and 30B-A3B MoE) and four post-trained variants tailored for downstream embodied tasks (i.e., RynnBrain-Nav, RynnBrain-Plan, and RynnBrain-VLA) or complex spatial reasoning tasks (i.e., RynnBrain-CoP). In terms of extensive evaluations on 20 embodied benchmarks and 8 general vision understanding benchmarks, our RynnBrain foundation models largely outperform existing embodied foundation models by a significant margin. The post-trained model suite further substantiates two key potentials of the RynnBrain foundation model: (i) enabling physically grounded reasoning and planning, and (ii) serving as a strong pretrained backbone that can be efficiently adapted to diverse embodied tasks.

View Paper