Evaluating Intelligence via Trial and Error

Jingtao Zhan, Jiahao Zhao, Jiayu Li, Yiqun Liu, Bo Zhang, Qingyao Ai, Jiaxin Mao, Hongning Wang, Min Zhang, Shaoping Ma

2025-03-12

Evaluating Intelligence via Trial and Error

Summary

This paper talks about a new way to test AI intelligence called Survival Game, where smarter systems solve problems faster with fewer mistakes, and it shows current AI still struggles with complex tasks humans handle easily.

What's the problem?

Current AI systems need way too many tries to solve hard problems like recognizing objects or understanding language, and improving them to human-like smarts would cost way too much money and energy.

What's the solution?

The Survival Game method measures AI by counting how many times it fails before solving a task, proving that today's AI needs better strategies than just copying data patterns to match human problem-solving.

Why it matters?

This helps guide AI research to focus on teaching systems to truly understand tasks instead of memorizing answers, making them smarter and more useful for real-world challenges like healthcare or robotics.

Abstract

Intelligence is a crucial trait for species to find solutions within a limited number of trial-and-error attempts. Building on this idea, we introduce Survival Game as a framework to evaluate intelligence based on the number of failed attempts in a trial-and-error process. Fewer failures indicate higher intelligence. When the expectation and variance of failure counts are both finite, it signals the ability to consistently find solutions to new challenges, which we define as the Autonomous Level of intelligence. Using Survival Game, we comprehensively evaluate existing AI systems. Our results show that while AI systems achieve the Autonomous Level in simple tasks, they are still far from it in more complex tasks, such as vision, search, recommendation, and language. While scaling current AI technologies might help, this would come at an astronomical cost. Projections suggest that achieving the Autonomous Level for general tasks would require 10^{26} parameters. To put this into perspective, loading such a massive model requires so many H100 GPUs that their total value is 10^{7} times that of Apple Inc.'s market value. Even with Moore's Law, supporting such a parameter scale would take 70 years. This staggering cost highlights the complexity of human tasks and the inadequacies of current AI technologies. To further investigate this phenomenon, we conduct a theoretical analysis of Survival Game and its experimental results. Our findings suggest that human tasks possess a criticality property. As a result, Autonomous Level requires a deep understanding of the task's underlying mechanisms. Current AI systems, however, do not fully grasp these mechanisms and instead rely on superficial mimicry, making it difficult for them to reach an autonomous level. We believe Survival Game can not only guide the future development of AI but also offer profound insights into human intelligence.

View Paper