TextArena

Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, Cheston Tan

2025-04-16

Summary

This paper talks about TextArena, which is a collection of text-based games that anyone can use to test and improve how well AI models, especially large language models, can handle social situations and make decisions in competitive scenarios.

What's the problem?

The problem is that most ways of testing AI models focus on simple tasks and don't really measure how well these models can act like real people in dynamic situations, such as negotiating, persuading, or understanding what others might be thinking. This leaves a gap in knowing how good these models are at handling real-world social skills.

What's the solution?

The researchers built TextArena as an open platform with over 57 different text-based games, including single-player, two-player, and multiplayer options. These games are designed to test a wide range of social and reasoning skills, like planning, bluffing, and negotiation. The platform lets both humans and AI models compete against each other, and it uses a live leaderboard to track how well each model or person does over time. The system is also set up so people can easily add new games or test new models, making it a growing and flexible way to evaluate AI.

Why it matters?

This matters because it gives scientists and developers a much better way to see if AI models can handle complex social situations, not just simple question-answering. By using games that require real social skills, TextArena helps push AI to become more useful and reliable in real-life interactions, making it more trustworthy for things like teamwork, negotiation, and decision-making.

Abstract

TextArena is an open-source collection of competitive text-based games designed to evaluate dynamic social skills and agentic behavior in Large Language Models (LLMs).

View Paper