Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
Haoran Sun, Yankai Jiang, Zhenyu Tang, Yaning Pan, Shuang Gu, Zekai Lin, Lilong Wang, Wenjie Lou, Lei Liu, Lei Bai, Xiaosong Wang
2025-10-22
Summary
This paper focuses on making scientific experiments easier to reproduce by using artificial intelligence to automatically create detailed instructions, or protocols, from simple questions. It introduces a new system called Thoth that's better at generating these protocols than existing AI models.
What's the problem?
Currently, when scientists try to use AI like ChatGPT to write up how to do an experiment, the instructions are often incomplete, confusing, or just plain wrong. This makes it hard to actually *do* the experiment based on the AI's output, defeating the purpose of trying to automate the process and hindering the ability to verify scientific findings. Essentially, existing AI isn't reliable enough for creating usable experimental protocols.
What's the solution?
The researchers tackled this problem in a few key ways. First, they created a huge dataset called SciRecipe containing over 12,000 detailed experimental procedures across many biology fields. Then, they developed a new approach called 'Sketch-and-Fill,' where the AI first outlines the experiment's steps, then fills in the details, making sure each step is clear and logical. They also designed a special reward system for the AI that checks if the steps are granular enough, in the right order, and scientifically accurate. Finally, they built Thoth, an AI model trained using this data and approach, which learns to turn scientific knowledge into actionable steps.
Why it matters?
This work is important because it moves us closer to having AI assistants that can actually help scientists *do* experiments. If we can reliably generate accurate and complete protocols, it will be much easier to reproduce results, verify findings, and accelerate scientific discovery. The fact that they're making all their data and code publicly available means other researchers can build on this work and further improve these AI-powered scientific tools.
Abstract
The foundation of reproducible science lies in protocols that are precise, logically ordered, and executable. The autonomous generation of these protocols through natural language queries could greatly improve the efficiency of the reproduction process. However, current leading large language models (LLMs) often generate incomplete or inconsistent protocols, limiting their utility. To address this limitation, we first introduce SciRecipe, a large-scale dataset of over 12K structured protocols spanning 27 biological subfields and encompassing both comprehension and problem-solving tasks. To further improve protocol generation, we propose the "Sketch-and-Fill" paradigm, which separates analysis, structuring, and expression to ensure each step is explicit and verifiable. Complementing this, the structured component-based reward mechanism evaluates step granularity, action order, and semantic fidelity, aligning model optimization with experimental reliability. Building on these components, we develop Thoth, trained through a staged Knowledge-to-Action process that progresses from knowledge acquisition to operational reasoning and ultimately to robust, executable protocol generation. Across multiple benchmarks, Thoth consistently surpasses both proprietary and open-source LLMs, achieving significant improvements in step alignment, logical sequencing, and semantic accuracy. Our approach paves the way for reliable scientific assistants that bridge knowledge with experimental execution. All data, code, and models will be released publicly.