Does Spatial Cognition Emerge in Frontier Models?
Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Kraehenbuehl, Vladlen Koltun
2024-10-10

Summary
This paper presents SPACE, a benchmark designed to evaluate how well advanced AI models understand spatial cognition, which is the ability to think about and navigate physical spaces.
What's the problem?
Current AI models, especially those focused on language and vision, often lack a thorough evaluation of their spatial reasoning skills. Most existing tests only measure basic problem-solving abilities and ignore important factors like fairness and multilingual understanding. This makes it hard to know how these models compare to human or animal spatial intelligence.
What's the solution?
To address this, the authors developed SPACE, which systematically assesses various aspects of spatial cognition in AI models. This includes testing their ability to map environments, reason about shapes and layouts, and use spatial memory. The benchmark allows for comparisons between different models by using both text and images in the tests. The results showed that many of today's leading AI models perform poorly on these spatial tasks, often achieving scores close to random guessing.
Why it matters?
This research is significant because it highlights the limitations of current AI models in understanding spatial concepts, which are crucial for many real-world applications like robotics and navigation. By identifying these gaps, the study encourages further development of AI systems that can better emulate human-like spatial reasoning abilities.
Abstract
Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition.