EXP-Bench: Can AI Conduct AI Research Experiments?
Patrick Tser Jern Kon, Jiachen Liu, Xinyi Zhu, Qiuyi Ding, Jingjia Peng, Jiarong Xing, Yibo Huang, Yiming Qiu, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Matei Zaharia, Ang Chen
2025-06-02

Summary
This paper talks about EXP-Bench, a new test that checks if AI agents can actually carry out full research experiments on their own by using tasks taken from some of the best AI research papers.
What's the problem?
The problem is that while AI is good at specific tasks like answering questions or analyzing data, it's unclear if AI agents can handle the entire process of running scientific experiments, which involves planning, testing, and drawing conclusions.
What's the solution?
The researchers created EXP-Bench, which gives AI agents a set of real research tasks to complete, just like a human scientist would. By doing this, they were able to see where the AI does well and where it struggles, showing the current limits of AI in scientific research.
Why it matters?
This is important because if AI can eventually learn to run experiments on its own, it could speed up scientific discoveries and help researchers in all kinds of fields. EXP-Bench helps us understand what needs to be improved for AI to reach that level.
Abstract
EXP-Bench evaluates AI agents' end-to-end research experiment capabilities through curated tasks from top AI papers, highlighting current limitations.