Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
Fan Zhang, Shulin Tian, Ziqi Huang, Yu Qiao, Ziwei Liu
2024-12-17

Summary
This paper presents the Evaluation Agent, a new framework designed to evaluate visual generative models more efficiently and effectively. It aims to provide evaluations that are both quick and tailored to user needs by mimicking how humans assess model performance.
What's the problem?
Evaluating visual generative models, which create images and videos, usually requires testing with hundreds or thousands of samples. This process is very slow and expensive, especially for models that generate content slowly. Current evaluation methods are often rigid and do not consider what users really need, providing numerical results that can be hard to understand.
What's the solution?
The Evaluation Agent framework uses human-like strategies to conduct evaluations in a more dynamic way. It only needs a few samples per round to form impressions about the model's capabilities. This approach improves efficiency, allows for tailored evaluations based on user requirements, provides clearer explanations beyond just numbers, and can be applied across different models and tools. Experiments showed that this framework reduces evaluation time to just 10% of traditional methods while still giving comparable results.
Why it matters?
The Evaluation Agent is important because it makes the process of evaluating complex generative models faster and more user-friendly. By being open-sourced, it encourages further research and development in the field of visual generative models, ultimately leading to better technology in areas like art creation, video generation, and more.
Abstract
Recent advancements in visual generative models have enabled high-quality image and video generation, opening diverse applications. However, evaluating these models often demands sampling hundreds or thousands of images or videos, making the process computationally expensive, especially for diffusion-based models with inherently slow sampling. Moreover, existing evaluation methods rely on rigid pipelines that overlook specific user needs and provide numerical results without clear explanations. In contrast, humans can quickly form impressions of a model's capabilities by observing only a few samples. To mimic this, we propose the Evaluation Agent framework, which employs human-like strategies for efficient, dynamic, multi-round evaluations using only a few samples per round, while offering detailed, user-tailored analyses. It offers four key advantages: 1) efficiency, 2) promptable evaluation tailored to diverse user needs, 3) explainability beyond single numerical scores, and 4) scalability across various models and tools. Experiments show that Evaluation Agent reduces evaluation time to 10% of traditional methods while delivering comparable results. The Evaluation Agent framework is fully open-sourced to advance research in visual generative models and their efficient evaluation.