GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

Jingxuan Wei, Caijun Jia, Xi Bai, Xinglong Xu, Siyuan Li, Linzhuang Sun, Bihui Yu, Conghui He, Lijun Wu, Cheng Tan

2025-11-17

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

Summary

This paper introduces a new way to test how well artificial intelligence models can *both* understand information and then create something new based on that understanding, specifically focusing on models that work with multiple types of data like text and images.

What's the problem?

Current tests for these advanced AI models usually check if they can simply *tell things apart* or *make images from scratch*. However, they don't really see if the AI can take information, think through a problem, and then *actively build* a solution. It's like testing if someone can recognize a triangle versus testing if they can actually draw one accurately based on instructions.

What's the solution?

The researchers created a benchmark called GGBench, which challenges AI models to solve geometry problems. These problems require the AI to understand written instructions and then precisely *draw* the geometric shapes described. This forces the AI to combine language understanding with visual creation, testing a more complete form of intelligence.

Why it matters?

This new benchmark is important because it sets a higher standard for evaluating AI. It moves beyond just seeing if an AI can recognize things to seeing if it can actually *reason* and *create* solutions, which is a crucial step towards building truly intelligent systems that can help us solve complex problems.

Abstract

The advent of Unified Multimodal Models (UMMs) signals a paradigm shift in artificial intelligence, moving from passive perception to active, cross-modal generation. Despite their unprecedented ability to synthesize information, a critical gap persists in evaluation: existing benchmarks primarily assess discriminative understanding or unconstrained image generation separately, failing to measure the integrated cognitive process of generative reasoning. To bridge this gap, we propose that geometric construction provides an ideal testbed as it inherently demands a fusion of language comprehension and precise visual generation. We introduce GGBench, a benchmark designed specifically to evaluate geometric generative reasoning. It provides a comprehensive framework for systematically diagnosing a model's ability to not only understand and reason but to actively construct a solution, thereby setting a more rigorous standard for the next generation of intelligent systems. Project website: https://opendatalab-raiser.github.io/GGBench/.

View Paper