GENIUS: Generative Fluid Intelligence Evaluation Suite
Ruichuan An, Sihan Yang, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li, Renrui Zhang, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang
2026-02-12
Summary
This paper introduces a new way to test how well AI models can *think* creatively, not just recall what they've already learned. It focuses on a type of intelligence called 'Generative Fluid Intelligence', which is about solving new problems on the spot, rather than just using existing knowledge.
What's the problem?
Current tests for AI image generation mostly check if the AI remembers things and can recreate known styles. They don't really test if the AI can understand a new situation, figure out rules, or adapt to unusual requests. The researchers noticed that existing models were good at things they’d seen before, but struggled with truly novel challenges that required reasoning and understanding context.
What's the solution?
The researchers created a new set of tests, called GENIUS, designed to specifically measure this 'Generative Fluid Intelligence'. These tests involve tasks like figuring out someone’s visual preferences, creating images based on abstract ideas, and understanding how things would behave in unrealistic situations. They then tested 12 different AI models on these tasks and found they all performed poorly, not because they couldn't *create* images, but because they didn't understand the context of the requests. Finally, they developed a simple technique to improve how the AI focuses on the important parts of a prompt, which helped a little.
Why it matters?
This work is important because it shows that current AI models are still limited in their ability to truly *reason* and adapt. It provides a new standard for evaluating AI, pushing the field to develop models that can go beyond simply memorizing and recreating, and instead demonstrate genuine intelligence and problem-solving skills.
Abstract
Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess Crystallized Intelligence, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks Generative Fluid Intelligence (GFI): the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce GENIUS (GEN Fluid Intelligence EvalUation Suite). We formalize GFI as a synthesis of three primitives. These include Inducing Implicit Patterns (e.g., inferring personalized visual preferences), Executing Ad-hoc Constraints (e.g., visualizing abstract metaphors), and Adapting to Contextual Knowledge (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, GENIUS establishes a rigorous standard for GFI, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: https://github.com/arctanxarc/GENIUS{https://github.com/arctanxarc/GENIUS}.