RExBench: Can coding agents autonomously implement AI research extensions?

Nicholas Edwards, Yukyung Lee, Yujun, Mao, Yulu Qin, Sebastian Schuster, Najoung Kim

2025-07-01

RExBench: Can coding agents autonomously implement AI research
extensions?

Summary

This paper talks about RExBench, a test designed to see if AI coding agents can independently add new ideas or features to existing AI research projects by understanding research papers and code.

What's the problem?

Current AI agents struggle to fully understand and extend research on their own and often need a lot of help from humans, limiting their ability to do real scientific work autonomously.

What's the solution?

The researchers created RExBench as a set of 12 challenging tasks where agents try to implement new research experiments based on instructions and existing code. They tested several AI agents and found that while hints from humans help, the agents still fail most of the time without human guidance.

Why it matters?

This matters because it shows where AI stands in helping with real research and highlights the need for better tools if we want AI to contribute independently to scientific progress in the future.

Abstract

RExBench evaluates the capability of LLM agents to autonomously implement research extensions, finding that current agents require significant human guidance to succeed.

View Paper