Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha

2026-04-24

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Summary

This paper introduces a new way to help AI agents, specifically those powered by Large Language Models, become better at playing games that require planning and using skills over a long period of time.

What's the problem?

Large Language Models are good at understanding and generating text, but they often struggle when they need to make a series of decisions in a game to achieve a goal. They have trouble learning which actions are useful, remembering those actions for later, and consistently applying them when needed, especially when rewards aren't immediate and the game doesn't reveal everything at once.

What's the solution?

The researchers created a system called COSPLAY. It works by having two parts that learn together: an LLM that decides what to do, and a 'skill bank' that learns useful actions from the LLM's gameplay. The LLM can look at the skill bank to find helpful actions, and the skill bank constantly updates itself with new and improved actions based on what the LLM does. This allows the LLM to focus on high-level planning while relying on the skill bank for reliable execution of specific tasks.

Why it matters?

This research is important because it helps AI agents become more capable in complex environments like games. By allowing AI to learn and reuse skills effectively, it moves us closer to creating AI that can solve real-world problems that require long-term planning and adaptation, and it shows a promising way to improve the performance of Large Language Models in interactive settings.

Abstract

Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent skill usage in environments. Large Language Models (LLMs) offer a promising alternative as game playing agents, but they often struggle with consistent long horizon decision making because they lack a mechanism to discover, retain, and reuse structured skills across episodes. We present COSPLAY, a co evolution framework in which an LLM decision agent retrieves skills from a learnable skill bank to guide action taking, while an agent managed skill pipeline discovers reusable skills from the agents unlabeled rollouts to form a skill bank. Our framework improves both the decision agent to learn better skill retrieval and action generation, while the skill bank agent continually extracts, refines, and updates skills together with their contracts. Experiments across six game environments show that COSPLAY with an 8B base model achieves over 25.1 percent average reward improvement against four frontier LLM baselines on single player game benchmarks while remaining competitive on multi player social reasoning games.

View Paper