VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

Yueqi Song, Tianyue Ou, Yibo Kong, Zecheng Li, Graham Neubig, Xiang Yue

2025-04-16

VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain
Knowledge

Summary

This paper talks about VisualPuzzles, a new set of tests made to check how well AI models can solve visual reasoning problems without needing to know a lot of specific facts or background information.

What's the problem?

The problem is that most current tests for AI models mix up general thinking skills with special knowledge from certain subjects, making it hard to tell if the AI is actually good at reasoning or just memorizing facts. Because of this, it's tough to measure if an AI can really think through visual puzzles the way people do.

What's the solution?

The researchers created VisualPuzzles as a benchmark that focuses only on general visual reasoning, so models are tested on their ability to solve puzzles and patterns without relying on outside knowledge. When they tested current AI models on these puzzles, they found that the models still aren't as good as humans at this kind of thinking.

Why it matters?

This matters because it helps scientists better understand where AI still needs to improve and makes it easier to build smarter models that can actually reason, not just remember facts. It also helps make AI more trustworthy for tasks that require true problem-solving skills.

Abstract

VisualPuzzles is a benchmark designed to evaluate general visual reasoning abilities by minimizing reliance on domain-specific knowledge, revealing that current multimodal models lag behind humans in these tasks.

View Paper