Can We Predict Before Executing Machine Learning Agents?
Jingsheng Zheng, Jintian Zhang, Yujie Luo, Yuren Mao, Yunjun Gao, Lun Du, Huajun Chen, Ningyu Zhang
2026-01-12
Summary
This paper introduces a new way to speed up scientific discovery using artificial intelligence, specifically focusing on how AI agents can learn and test hypotheses more efficiently.
What's the problem?
Currently, AI agents designed to conduct experiments and learn from the results are slowed down by the need to physically *do* each experiment to see what happens. This 'Generate-Execute-Feedback' process is time-consuming and expensive because every hypothesis needs a real-world test. It's like having to build a whole prototype every time you want to test a small idea.
What's the solution?
The researchers tackled this problem by giving the AI agent a 'world model' – essentially, a way to predict the outcome of experiments *before* actually running them. They trained a large language model to predict which experimental results would be better, based on detailed reports of previous data analysis. This 'Predict-then-Verify' approach allows the agent to focus its real-world experiments on the most promising ideas, skipping the ones likely to fail. They created a dataset of over 18,000 comparisons to train and test this predictive ability, and built an agent called FOREAGENT to demonstrate the method.
Why it matters?
This work is important because it significantly speeds up the scientific process. By reducing the number of physical experiments needed, researchers can explore more ideas and make discoveries faster. The 6x speedup and 6% improvement over traditional methods show a real potential to accelerate progress in fields like materials science, drug discovery, and other areas where experimentation is key.
Abstract
Autonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Execution Bottleneck, as hypothesis evaluation relies strictly on expensive physical execution. To bypass these physical constraints, we internalize execution priors to substitute costly runtime checks with instantaneous predictive reasoning, drawing inspiration from World Models. In this work, we formalize the task of Data-centric Solution Preference and construct a comprehensive corpus of 18,438 pairwise comparisons. We demonstrate that LLMs exhibit significant predictive capabilities when primed with a Verified Data Analysis Report, achieving 61.5% accuracy and robust confidence calibration. Finally, we instantiate this framework in FOREAGENT, an agent that employs a Predict-then-Verify loop, achieving a 6x acceleration in convergence while surpassing execution-based baselines by +6%. Our code and dataset will be publicly available soon at https://github.com/zjunlp/predict-before-execute.