GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI
Xiangyuan Xue, Zeyu Lu, Di Huang, Wanli Ouyang, Lei Bai
2024-09-04

Summary
This paper talks about GenAgent, a new system that helps create collaborative AI workflows to solve complex tasks by integrating different AI models and data sources.
What's the problem?
Most AI research has focused on building single, powerful models that can perform specific tasks well. However, this approach can limit flexibility and scalability when dealing with diverse and complex problems that require collaboration between different models and data.
What's the solution?
GenAgent introduces a framework that automatically generates workflows to connect various AI models and data sources. It uses a language model (LLM) to break down complex tasks into smaller parts and then builds workflows step-by-step. By implementing GenAgent on the ComfyUI platform, the researchers created a new benchmark called OpenComfy to evaluate its performance. The results showed that GenAgent is more effective and stable than previous methods in generating complex workflows.
Why it matters?
This research is important because it paves the way for more efficient and flexible AI systems that can tackle a wider range of tasks. By enabling collaboration between different AI models, GenAgent can improve how we solve real-world problems in fields like healthcare, finance, and robotics.
Abstract
Much previous AI research has focused on developing monolithic models to maximize their intelligence and capability, with the primary goal of enhancing performance on specific tasks. In contrast, this paper explores an alternative approach: collaborative AI systems that use workflows to integrate models, data sources, and pipelines to solve complex and diverse tasks. We introduce GenAgent, an LLM-based framework that automatically generates complex workflows, offering greater flexibility and scalability compared to monolithic models. The core innovation of GenAgent lies in representing workflows with code, alongside constructing workflows with collaborative agents in a step-by-step manner. We implement GenAgent on the ComfyUI platform and propose a new benchmark, OpenComfy. The results demonstrate that GenAgent outperforms baseline approaches in both run-level and task-level evaluations, showing its capability to generate complex workflows with superior effectiveness and stability.