The Last Human-Written Paper: Agent-Native Research Artifacts

Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun

2026-05-01

The Last Human-Written Paper: Agent-Native Research Artifacts

Summary

This paper points out that traditional scientific papers aren't ideal for computers to understand and build upon. They present research as a neat story, leaving out a lot of the messy details and failed attempts that actually happened during the research process.

What's the problem?

Currently, scientific papers are written for humans, not for AI. They focus on successful results and present them in a streamlined way. This means important information like why certain experiments *didn't* work, or the exact steps taken to get a result, is often missing. This makes it really hard for AI to reproduce the research, understand it fully, or even extend it to new ideas because they don't have the complete picture.

What's the solution?

The authors propose a new way to share research called the Agent-Native Research Artifact, or ARA. Instead of a paper, researchers would create a package with four parts: the scientific ideas, the actual code used, a record of *all* the experiments tried (including the failures), and proof for every claim made using the raw data. They also created tools to help researchers make these ARAs, convert existing papers, and review them in a way that focuses on the big ideas rather than getting bogged down in details.

Why it matters?

This is important because as AI becomes more involved in scientific discovery, we need a better way to communicate research findings to them. ARAs could significantly improve AI's ability to learn from past work, reproduce results, and ultimately accelerate the pace of scientific progress. The tests showed a big jump in AI's ability to answer questions and replicate research when using the ARA format, though it also showed that sometimes too much information from past failures can limit an AI's creativity.

Abstract

Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an ARA Compiler that translates legacy PDFs and repos into ARAs; and an ARA-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, ARA raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in ARA accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.

View Paper