ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

Dingming Li, Hongxing Li, Zixuan Wang, Yuchen Yan, Hang Zhang, Siqi Chen, Guiyang Hou, Shengpei Jiang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang

2025-05-28

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in
Vision-Language Models

Summary

This paper talks about ViewSpatial-Bench, a new test that checks how well AI models can understand where things are in space when looking at them from different angles or viewpoints.

What's the problem?

The problem is that most vision-language models, which combine images and text, have trouble figuring out the exact location of objects when the viewpoint changes, which is important for things like robotics, navigation, and virtual reality.

What's the solution?

The researchers created ViewSpatial-Bench to measure how well these models can reason about space from multiple perspectives, and they found that the models do better when they are trained more on 3D spatial data.

Why it matters?

This matters because improving how AI understands space from different viewpoints can make technology better at things like self-driving cars, drone navigation, and creating more realistic virtual worlds.

Abstract

A new benchmark, ViewSpatial-Bench, evaluates VLMs on multi-viewpoint spatial reasoning, revealing performance gaps that are mitigated with fine-tuning on 3D spatial datasets.

View Paper