From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

Shouwei Ruan, Liyuan Wang, Caixin Kang, Qihui Zhu, Songming Liu, Xingxing Wei, Hang Su

2025-09-02

From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

Summary

This research introduces a new system called BSC-Nav that helps AI agents understand and navigate spaces more like humans do, by building and using a kind of internal map.

What's the problem?

Current AI agents using large language models can 'see' and 'talk' about their surroundings, but they don't really *remember* spaces in a structured way. They react to what's immediately in front of them instead of having a broader understanding of the environment, which limits how well they can plan routes or adapt to new situations. Essentially, they lack a good 'spatial memory'.

What's the solution?

BSC-Nav creates a 'cognitive map' for the AI, similar to how our brains create maps. It takes the agent's movements and what it observes to build this map, focusing on both specific landmarks and the overall layout of the area. The system can then use this map to find its way around, even when given new goals or tasks, and combines this spatial understanding with the reasoning abilities of large language models.

Why it matters?

This work is important because it's a step towards creating AI agents that can navigate and interact with the real world more effectively. By giving AI a more human-like spatial understanding, it can handle complex environments, generalize to new places, and perform a wider range of tasks, ultimately leading to more versatile and intelligent robots.

Abstract

Spatial cognition enables adaptive goal-directed behavior by constructing internal models of space. Robust biological systems consolidate spatial knowledge into three interconnected forms: landmarks for salient cues, route knowledge for movement trajectories, and survey knowledge for map-like representations. While recent advances in multi-modal large language models (MLLMs) have enabled visual-language reasoning in embodied agents, these efforts lack structured spatial memory and instead operate reactively, limiting their generalization and adaptability in complex real-world environments. Here we present Brain-inspired Spatial Cognition for Navigation (BSC-Nav), a unified framework for constructing and leveraging structured spatial memory in embodied agents. BSC-Nav builds allocentric cognitive maps from egocentric trajectories and contextual cues, and dynamically retrieves spatial knowledge aligned with semantic goals. Integrated with powerful MLLMs, BSC-Nav achieves state-of-the-art efficacy and efficiency across diverse navigation tasks, demonstrates strong zero-shot generalization, and supports versatile embodied behaviors in the real physical world, offering a scalable and biologically grounded path toward general-purpose spatial intelligence.

View Paper