A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Qianben Chen, Jingyi Cao, Jiayu Zhang, Tianrui Qin, Xiaowan Li, King Zhu, Dingfeng Shi, He Zhu, Minghao Liu, Xiaobo Liang, Xin Gui, Ge Zhang, Jian Yang, Yuchen Eleanor Jiang, Wangchunshu Zhou

2025-10-20

A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Summary

This paper introduces a new type of large language model called A^2FM, designed to be both good at complex reasoning and effectively using tools to solve problems. It aims to bridge the gap between models that excel at thinking and those that excel at doing.

What's the problem?

Currently, large language models generally fall into two categories: those that are strong at internal reasoning but can't use outside tools, and those that are good at using tools but struggle with deep thought. This specialization leads to inefficiencies. Even simple questions can cause these models to overthink or unnecessarily use tools, wasting resources and time. Essentially, they don't know *when* to just answer directly versus when to engage in complex processes.

What's the solution?

The researchers created A^2FM, which uses a 'route-then-align' strategy. This means the model first figures out *how* to approach a task, then chooses the best way to execute it. They added a third 'instant' mode for simple questions, allowing the model to answer directly without unnecessary steps. To train this, they developed a new method called Adaptive Policy Optimization, which helps the model learn to efficiently choose between reasoning, using tools, or answering instantly, while also prioritizing accuracy and minimizing cost.

Why it matters?

A^2FM is significant because it achieves state-of-the-art results on several benchmark tests, performing as well as or better than leading models in both reasoning and tool-using tasks. More importantly, it does so much more efficiently, reducing the cost of getting a correct answer by a substantial margin – almost 45% compared to reasoning-focused models and 33.5% compared to agentic models. This means we can get similar performance at a lower price, making these powerful AI models more accessible and practical.

Abstract

Large language models split into two families: reasoning-centric LLMs, which strengthen internal chain-of-thought reasoning but cannot invoke external tools, and agentic LLMs, which learn to interact with environments and leverage tools but often lag in deep reasoning. This divide arises from fundamentally different training objectives, leading to mismatched strengths and inefficiency on simple queries, where both families tend to overthink or over-call tools. In this work, we present Adaptive Agent Foundation Model (A^2FM), a unified framework that follows a route-then-align principle: the model first learns task-aware routing and then aligns mode-specific trajectories under a shared backbone. To address the inefficiency gap, we introduce a third mode-instant-that handles simple queries directly, preventing unnecessary reasoning or tool calls while complementing the agentic and reasoning modes. To jointly enhance accuracy and efficiency, we propose Adaptive Policy Optimization (APO), which enforces adaptive sampling across modes and applies a cost-regularized reward. On the 32B scale, A^2FM achieves 13.4% on BrowseComp, 70.4% on AIME25, and 16.7% on HLE, setting new SOTA among comparable models and performing competitively with frontier LLMs across agentic, reasoning, and general benchmarks. Notably, the adaptive execution achieves a cost of pass of only $0.00487 per correct answer-cutting cost by 45.2% relative to reasoning and 33.5% relative to agentic, thus delivering substantially higher cost efficiency while maintaining comparable accuracy.

View Paper