Toward Autonomous Long-Horizon Engineering for ML Research

Guoxin Chen, Jie Chen, Lei Chen, Jiale Zhao, Fanzhe Meng, Wayne Xin Zhao, Ruihua Song, Cheng Chen, Ji-Rong Wen, Kai Jia

2026-04-15

Toward Autonomous Long-Horizon Engineering for ML Research

Summary

This paper introduces AiScientist, a system designed to automate much of the process of doing machine learning research, specifically the long and complex tasks that take hours or even days to complete.

What's the problem?

Doing machine learning research isn't just about having a good idea; it's about actually *doing* the work. This involves understanding the problem, setting up the environment, writing code, running experiments, and fixing bugs. Current AI systems struggle with this 'long-horizon' process because they lose track of what they've done and need constant guidance, making it hard to sustain progress over extended periods.

What's the solution?

AiScientist tackles this by combining two key ideas. First, it uses a 'hierarchical orchestrator' which acts like a project manager, breaking down the research into stages and keeping track of the overall plan. Second, it uses a 'File-as-Bus' system, which means all the important information – analyses, code, experiment results – is saved as files that the system can constantly refer back to. This way, different parts of the AI don't have to rely on remembering everything from previous conversations, but can build on durable, saved work.

Why it matters?

The results show AiScientist significantly outperforms other systems on standard benchmarks, suggesting that automating machine learning research isn't just about making smarter AI, but about building a better *system* for managing and coordinating the research process. It highlights that the challenge lies in coordinating specialized tasks and maintaining a consistent project state, rather than just improving the AI's reasoning abilities.

Abstract

Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon engineering for ML research built on a simple principle: strong long-horizon performance requires both structured orchestration and durable state continuity. To this end, AiScientist combines hierarchical orchestration with a permission-scoped File-as-Bus workspace: a top-level Orchestrator maintains stage-level control through concise summaries and a workspace map, while specialized agents repeatedly re-ground on durable artifacts such as analyses, plans, code, and experimental evidence rather than relying primarily on conversational handoffs, yielding thin control over thick state. Across two complementary benchmarks, AiScientist improves PaperBench score by 10.54 points on average over the best matched baseline and achieves 81.82 Any Medal% on MLE-Bench Lite. Ablation studies further show that File-as-Bus protocol is a key driver of performance, reducing PaperBench by 6.41 points and MLE-Bench Lite by 31.82 points when removed. These results suggest that long-horizon ML research engineering is a systems problem of coordinating specialized work over durable project state, rather than a purely local reasoning problem.

View Paper