Octo-planner: On-device Language Model for Planner-Action Agents

Wei Chen, Zhiyuan Li, Zhen Guo, Yikang Shen

2024-06-27

Octo-planner: On-device Language Model for Planner-Action Agents

Summary

This paper introduces the Octo-planner, an advanced AI system designed to help with planning and executing tasks on devices without needing constant internet access. It separates the planning process from action execution to improve efficiency and performance.

What's the problem?

As AI systems become more important in everyday life, they need to make decisions and solve problems effectively. However, many existing AI models struggle with planning tasks efficiently, especially on devices that have limited resources like smartphones or tablets. Traditional models often require a lot of power and memory, which can slow them down or make them less effective.

What's the solution?

The authors developed the Octo-planner, which consists of two main parts: a planner agent and an action agent. The planner agent breaks down user requests into smaller steps that can be easily managed. The action agent then carries out these steps. By using a special training method called fine-tuning instead of relying on in-context learning, the Octo-planner reduces the amount of memory and processing power needed, making it faster and more efficient. Additionally, they introduced a technique called multi-LoRA training to handle complex tasks across different areas without losing performance.

Why it matters?

This research is important because it allows AI systems to operate effectively on devices with limited resources, making them more accessible for everyday use. By improving how AI plans and executes tasks, the Octo-planner can enhance applications in various fields, such as personal assistants, smart home devices, and mobile apps, ultimately leading to better user experiences.

Abstract

AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3.8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution. The planner agent first responds to user queries by decomposing tasks into a sequence of sub-steps, which are then executed by the action agent. To optimize performance on resource-constrained devices, we employ model fine-tuning instead of in-context learning, reducing computational costs and energy consumption while improving response times. Our approach involves using GPT-4 to generate diverse planning queries and responses based on available functions, with subsequent validations to ensure data quality. We fine-tune the Phi-3 Mini model on this curated dataset, achieving a 97\% success rate in our in-domain test environment. To address multi-domain planning challenges, we developed a multi-LoRA training method that merges weights from LoRAs trained on distinct function subsets. This approach enables flexible handling of complex, multi-domain queries while maintaining computational efficiency on resource-constrained devices. To support further research, we have open-sourced our model weights at https://huggingface.co/NexaAIDev/octopus-planning. For the demo, please refer to https://www.nexa4ai.com/octo-planner.

View Paper