AI Can Learn Scientific Taste

Jingqi Tong, Mingzhe Li, Hangcheng Li, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang, Yugang Jiang

2026-03-17

Summary

This paper explores how to give AI systems a sense of 'scientific taste' – the ability to identify and suggest truly impactful research ideas, similar to what experienced scientists do.

What's the problem?

Current AI research focuses on making AI good at *doing* science, like running experiments or analyzing data. However, AI struggles with the initial, crucial step of *deciding what* research is worth pursuing. It lacks the intuition to judge which ideas have the most potential, and this limits its ability to make significant breakthroughs.

What's the solution?

The researchers developed a two-part system called Reinforcement Learning from Community Feedback (RLCF). First, they trained an 'AI Judge' by showing it many pairs of research papers, some highly cited (successful) and some not, so it could learn to predict which ideas are likely to be impactful. Then, they used this 'AI Judge' to train an 'AI Thinker' to actually *generate* new research ideas, rewarding it when it proposed ideas the Judge deemed promising. This process essentially teaches the AI what good scientific ideas 'look' like.

Why it matters?

This work is a significant step towards creating AI that can act as a true scientific partner, not just a tool. By giving AI the ability to identify promising research directions, we can accelerate the pace of discovery and potentially solve complex problems more effectively. It shows that AI can learn to evaluate research quality, bringing us closer to AI scientists that operate at a human level.

Abstract

Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most relative research focuses on improving an AI scientist's executive capability, while enhancing an AI's scientific taste remains underexplored. In this work, we propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem. For preference modeling, we train Scientific Judge on 700K field- and time-matched pairs of high- vs. low-citation papers to judge ideas. For preference alignment, using Scientific Judge as a reward model, we train a policy model, Scientific Thinker, to propose research ideas with high potential impact. Experiments show Scientific Judge outperforms SOTA LLMs (e.g., GPT-5.2, Gemini 3 Pro) and generalizes to future-year test, unseen fields, and peer-review preference. Furthermore, Scientific Thinker proposes research ideas with higher potential impact than baselines. Our findings show that AI can learn scientific taste, marking a key step toward reaching human-level AI scientists.

View Paper