ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization
Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam
2025-02-07
Summary
This paper talks about ScoreFlow, a new way to make AI agents work together more efficiently. It uses a special method called Score-DPO to help these AI agents learn and improve their teamwork skills.
What's the problem?
Current methods for making AI agents work together are not very flexible and don't adapt well to new situations. They also don't work well when you try to use them for bigger, more complex tasks. This makes it hard to create AI systems that can solve really tough problems without needing a lot of human help.
What's the solution?
The researchers created ScoreFlow, which uses a technique called gradient-based optimization. This allows the AI agents to learn and improve smoothly, rather than in sudden jumps. They also developed Score-DPO, which helps the system understand and use detailed feedback about how well it's doing. They tested ScoreFlow on different types of tasks like answering questions, writing code, and solving math problems.
Why it matters?
This matters because ScoreFlow could make AI systems much better at working together to solve complex problems. It showed an 8.2% improvement over other methods, which is a big deal in AI research. Even more exciting, it helped smaller AI models perform better than larger ones while using less computing power. This could lead to smarter, more efficient AI assistants that can handle a wider range of tasks, potentially revolutionizing fields like customer service, scientific research, and software development.
Abstract
Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow