SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?

Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xinyu Zhu, Mengcheng Zhou, Yanfeng Wang, Weinan E, Yuzhi Zhang, Linfeng Zhang, Siheng Chen

2025-07-11

SciMaster: Towards General-Purpose Scientific AI Agents, Part I.
X-Master as Foundation: Can We Lead on Humanity's Last Exam?

Summary

This paper talks about X-Master, an AI agent designed to solve scientific problems by using Python code and special tools to think and explore like human researchers, achieving a new top score on a very hard scientific test.

What's the problem?

Most AI systems either rely only on their built-in knowledge or need heavy retraining to improve, which limits their ability to handle complex, ever-changing scientific questions in a flexible way.

What's the solution?

The researchers built X-Master to act more like a human scientist who can switch between thinking inside the AI and using external tools by writing and running Python scripts. They enhanced it further by having multiple versions work together, sharing different roles to explore many solutions and improve the best answers before choosing one. This system reached a record score on Humanity’s Last Exam.

Why it matters?

This matters because it shows a new direction for building smart, flexible AI that can help accelerate real scientific discovery by working more like human experts, and it provides an open-source model that anyone can improve, making advanced scientific AI more accessible.

Abstract

X-Master, a tool-augmented reasoning agent using Python libraries and customized tools, achieves state-of-the-art performance on Humanity's Last Exam with a score of 32.1%.

View Paper