OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
Pan Lu, Bowen Chen, Sheng Liu, Rahul Thapa, Joseph Boen, James Zou
2025-02-19

Summary
This paper talks about OctoTools, a new AI system that helps large language models (LLMs) solve complex problems more effectively by using various tools and planning strategies.
What's the problem?
Current AI systems struggle with complex tasks that require understanding images, retrieving specific knowledge, doing math, and thinking through multiple steps. Existing solutions that give AI models extra tools are limited to specific areas, don't have many tool options, or need extra training data to work well.
What's the solution?
The researchers created OctoTools, which is like a smart assistant for AI models. It uses 'tool cards' to describe what different tools can do, has a planner to figure out how to use these tools, and an executor to actually use them. OctoTools doesn't need extra training and can work with many different types of problems. They tested it on 16 different tasks and found it did much better than other AI systems, including some of the best ones like GPT-4o.
Why it matters?
This matters because it could make AI systems much more capable and flexible in solving real-world problems. By giving AI the ability to use tools and plan better, OctoTools could help in areas like medicine, science, and education where complex reasoning is needed. It's also important because it works without needing extra training, which makes it easier and cheaper to use in different situations.
Abstract
Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools. Through comprehensive analysis and ablations, OctoTools demonstrates advantages in task planning, effective tool usage, and multi-step problem solving.