GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Siqi Zhu, David Zhang, Pedro Cisneros-Velarde, Jiaxuan You
2025-10-13
Summary
This paper addresses a problem with large language models (LLMs) where they sometimes give overly detailed or unnecessarily helpful responses, even when users just want a quick and concise answer. It introduces a new method called Game-Theoretic Alignment (GTAlign) to make LLMs better at understanding what users *actually* want and responding in a way that benefits both the user and the model itself.
What's the problem?
LLMs are getting really good at reasoning, but they often don't realize that being too thorough can be annoying or unhelpful. Current methods for 'aligning' LLMs – making them behave as we want – assume that what's best for the model (getting a high reward) is also best for the user. However, this isn't always true. It's like a situation where everyone acting in their own self-interest leads to a worse outcome for everyone, similar to the 'prisoner's dilemma'. The core issue is that LLMs lack a good way to decide what to do that considers both their needs and the user's.
What's the solution?
The researchers propose GTAlign, which uses ideas from game theory to help LLMs make better decisions. During the response generation process, the model essentially thinks of the interaction with the user as a game. It tries to predict how both it and the user will 'benefit' from different responses, and then chooses the response that maximizes benefit for both. They also changed how the model is trained, rewarding it for responses that are mutually beneficial. Finally, they developed a way for the model to adjust its responses based on how much it costs to use the LLM service.
Why it matters?
This work is important because it moves beyond simply making LLMs more powerful to making them more *helpful* and *efficient*. By considering the user's perspective and aiming for mutually beneficial interactions, GTAlign can lead to LLMs that provide better answers, waste less time, and ultimately offer a more satisfying user experience. It also shows a way to adapt to changing costs of using these models, making them more practical in real-world scenarios.
Abstract
Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a mutual welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and mutual welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .