The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Michael Littman, Jun Wang

2025-09-03

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Summary

This paper is a comprehensive overview of a new approach to using large language models (LLMs) called 'agentic reinforcement learning'. It's about moving beyond simply getting LLMs to generate text, and instead, turning them into autonomous agents that can make decisions and take actions in the real world.

What's the problem?

Traditional reinforcement learning with LLMs treats each task as a single step – the LLM gets an input and immediately gives an output, like filling in the blank. This is too simple for complex tasks. Real-world problems require planning, remembering past events, and using tools over a longer period of time, something standard LLM reinforcement learning can't handle effectively. It's like trying to build a robot that can only react to things *right now* without any memory or ability to plan ahead.

What's the solution?

The paper proposes a way to think about this new 'agentic' approach, where LLMs are seen as agents operating in dynamic environments. They break down the key abilities these agents need – like planning, using tools, remembering things, reasoning, improving themselves, and understanding what's going on around them. They then categorize existing research based on these abilities and how they're being applied to different tasks. Crucially, they emphasize that reinforcement learning is the key to making these abilities actually *work* and adapt to new situations, rather than just being pre-programmed rules. They also provide a helpful list of resources like software and testing environments for other researchers.

Why it matters?

This work is important because it lays the groundwork for building more capable and versatile AI systems. Instead of AI that just responds to prompts, we can create AI that can independently solve problems, learn from experience, and operate in the real world. This could lead to breakthroughs in areas like robotics, automation, and personalized assistance, ultimately paving the way for more general-purpose AI.

Abstract

The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.

View Paper