MARS: Modular Agent with Reflective Search for Automated AI Research
Jiefeng Chen, Bhavana Dalvi Mishra, Jaehyun Nam, Rui Meng, Tomas Pfister, Jinsung Yoon
2026-02-04
Summary
This paper introduces a new system called MARS designed to automate parts of the process of doing AI research, specifically focusing on improving large language models. It aims to make AI research more efficient and capable of discovering new insights on its own.
What's the problem?
Automating AI research is really hard because training and testing AI models takes a lot of computing power and it's often unclear *why* a particular change makes a model better or worse. Existing AI agents that try to do this often create overly complex and expensive plans, and they struggle to figure out which changes actually contributed to improvements. They don't effectively learn from their past attempts.
What's the solution?
MARS tackles this by using a three-part approach. First, it carefully plans experiments, considering both how well they might work *and* how much they will cost to run. Second, it breaks down complex research tasks into smaller, manageable pieces, making it easier to organize and modify the code. Finally, it learns by comparing successful and unsuccessful attempts, identifying the key differences that led to better results and using those lessons in future experiments. This allows it to build on previous knowledge and generalize insights.
Why it matters?
This work is important because it represents a significant step towards truly autonomous AI research. By automating the process of experimentation and discovery, MARS could accelerate the development of new and improved AI models. The fact that it's able to transfer knowledge between different research paths suggests it's not just blindly trying things, but actually understanding and learning from its experiences, which is a crucial step towards more intelligent AI systems.
Abstract
Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research. MARS relies on three pillars: (1) Budget-Aware Planning via cost-constrained Monte Carlo Tree Search (MCTS) to explicitly balance performance with execution expense; (2) Modular Construction, employing a "Design-Decompose-Implement" pipeline to manage complex research repositories; and (3) Comparative Reflective Memory, which addresses credit assignment by analyzing solution differences to distill high-signal insights. MARS achieves state-of-the-art performance among open-source frameworks on MLE-Bench under comparable settings, maintaining competitiveness with the global leaderboard's top methods. Furthermore, the system exhibits qualitative "Aha!" moments, where 63% of all utilized lessons originate from cross-branch transfer, demonstrating that the agent effectively generalizes insights across search paths.