AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading

Zheye Deng, Jiashu Wang

2025-10-22

AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading

Summary

This paper introduces AlphaQuanter, a new way to use artificial intelligence, specifically large language models, for automated trading in financial markets.

What's the problem?

Current AI systems for trading often involve multiple AI 'agents' working together, which can be slow, give conflicting advice, and don't learn a single, consistent trading strategy from their successes and failures. They also aren't very clear about *why* they're making certain trades, making it hard to trust or understand them.

What's the solution?

AlphaQuanter solves this by using just *one* AI agent that learns through trial and error, a method called reinforcement learning. This agent has access to various tools and can actively seek out information it needs to make decisions. Importantly, the system keeps a clear record of its reasoning, so you can see exactly why it chose to buy or sell. It's like having a smart trader who explains their thought process.

Why it matters?

AlphaQuanter performs very well compared to other AI trading systems and, crucially, it reveals sophisticated trading strategies that even human traders can learn from. This means it's not just about making money, but also about gaining new insights into how financial markets work and potentially improving trading for everyone.

Abstract

While Large Language Model (LLM) agents show promise in automated trading, they still face critical limitations. Prominent multi-agent frameworks often suffer from inefficiency, produce inconsistent signals, and lack the end-to-end optimization required to learn a coherent strategy from market feedback. To address this, we introduce AlphaQuanter, a single-agent framework that uses reinforcement learning (RL) to learn a dynamic policy over a transparent, tool-augmented decision workflow, which empowers a single agent to autonomously orchestrate tools and proactively acquire information on demand, establishing a transparent and auditable reasoning process. Extensive experiments demonstrate that AlphaQuanter achieves state-of-the-art performance on key financial metrics. Moreover, its interpretable reasoning reveals sophisticated strategies, offering novel and valuable insights for human traders. Our code for data acquisition and agent training is publicly available at: https://github.com/AlphaQuanter/AlphaQuanter

View Paper