lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, Hao Zhang
2025-05-22
Summary
This paper talks about lmgame-Bench, a new way to test how well big language models can play different types of games that require various skills and strategies.
What's the problem?
It's hard to know if AI language models are actually good at solving a wide range of problems, especially when those problems are presented as games that test different abilities like logic, memory, and creativity.
What's the solution?
The researchers created lmgame-Bench, a benchmark made up of many different games, to see how these AI models handle each challenge and to find out if they can use what they've learned in one game to do better in another.
Why it matters?
This matters because it helps us understand the true strengths and weaknesses of language models, which is important for building smarter, more flexible AI that can help people in many different situations.
Abstract
lmgame-Bench evaluates large language models using games with diverse challenges, demonstrating unique capability blends and transfer learning potential.