AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Linrui Xu, Tian Cheng, Guanyu Jiang, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han

2025-10-09

AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

Summary

This paper introduces AlphaApollo, a new system designed to make large language models, also known as foundation models, better at reasoning and solving complex problems.

What's the problem?

Large language models are powerful, but they struggle with two main issues when it comes to reasoning. First, their internal ability to think through problems is limited. Second, even when they try to improve their answers through multiple attempts, those attempts aren't always reliable or helpful. Essentially, they can get stuck and don't always know how to correct themselves effectively.

What's the solution?

AlphaApollo tackles these problems by combining the strengths of different models and giving them access to tools. It uses a Python environment for doing calculations and a retrieval tool to find relevant information online. The system keeps track of different possible solutions, checks if they work, and uses feedback to refine them over multiple rounds. This allows the models to 'think' more deliberately and verify their work, leading to more accurate results.

Why it matters?

This research is important because it significantly boosts the reasoning abilities of existing language models. The results show substantial improvements in performance on challenging reasoning tasks, meaning these models can potentially solve more complex problems in areas like science, math, and everyday decision-making. By showing that giving models access to tools and a structured way to improve their reasoning works well, AlphaApollo points the way towards building even more capable AI systems.

Abstract

We present AlphaApollo, a self-evolving agentic reasoning system that aims to address two bottlenecks in foundation model (FM) reasoning-limited model-intrinsic capacity and unreliable test-time iteration. AlphaApollo orchestrates multiple models with professional tools to enable deliberate, verifiable reasoning. It couples (i) a computation tool (Python with numerical and symbolic libraries) and (ii) a retrieval tool (task-relevant external information) to execute exact calculations and ground decisions. The system further supports multi-round, multi-model solution evolution via a shared state map that records candidates, executable checks, and feedback for iterative refinement. In evaluations on AIME 2024/2025 across multiple models, AlphaApollo delivers consistent gains: +5.15% Average@32 and +23.34% Pass@32 for Qwen2.5-14B-Instruct, and +8.91% Average@32 with +26.67% Pass@32 for Llama-3.3-70B-Instruct. Tool-use analysis shows that more than 80% of tool calls are successfully executed, with consistent outperformance of non-tool baselines, thereby lifting the capability ceiling of FMs. More empirical results and implementation details will be updated at https://github.com/tmlr-group/AlphaApollo.

View Paper