OAgents: An Empirical Study of Building Effective Agents

He Zhu, Tianrui Qin, King Zhu, Heyuan Huang, Yeyi Guan, Jinxiang Xia, Yi Yao, Hanhao Li, Ningning Wang, Pai Liu, Tianhao Peng, Xin Gui, Xiaowan Li, Yuhui Liu, Yuchen Eleanor Jiang, Jun Wang, Changwang Zhang, Xiangru Tang, Ge Zhang, Jian Yang, Minghao Liu, Xitong Gao

2025-06-24

OAgents: An Empirical Study of Building Effective Agents

Summary

This paper talks about OAgents, a new framework developed to study and create more effective AI agents through a detailed and scientific approach.

What's the problem?

The problem is that current AI agent research lacks standardized methods and scientific rigor, making it difficult to fairly compare different systems and understand which design choices really matter for making effective agents.

What's the solution?

The researchers conducted a thorough empirical study using established benchmarks, developing a more reliable evaluation protocol to reduce random variation and tested different design components carefully. Based on these findings, they built OAgents, a modular agent framework that performs really well and is open for others to use.

Why it matters?

This matters because OAgents helps researchers build better AI agents by understanding what truly works, improving the reliability of results, and promoting future innovations in AI that can act intelligently in complex environments.

Abstract

Recently, Agentic AI has become an increasingly popular research field. However, we argue that current agent research practices lack standardization and scientific rigor, making it hard to conduct fair comparisons among methods. As a result, it is still unclear how different design choices in agent frameworks affect effectiveness, and measuring their progress remains challenging. In this work, we conduct a systematic empirical study on GAIA benchmark and BrowseComp to examine the impact of popular design choices in key agent components in a fair and rigorous manner. We find that the lack of a standard evaluation protocol makes previous works, even open-sourced ones, non-reproducible, with significant variance between random runs. Therefore, we introduce a more robust evaluation protocol to stabilize comparisons. Our study reveals which components and designs are crucial for effective agents, while others are redundant, despite seeming logical. Based on our findings, we build and open-source OAgents, a new foundation agent framework that achieves state-of-the-art performance among open-source projects. OAgents offers a modular design for various agent components, promoting future research in Agentic AI.

View Paper