ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review
Gaurav Sahu, Hugo Larochelle, Laurent Charlin, Christopher Pal
2025-10-13
Summary
This paper introduces ReviewerToo, a new system designed to help with the process of peer review in scientific publishing by using artificial intelligence to assist human reviewers.
What's the problem?
The current peer review system, which is how scientific papers are checked for quality before publication, has some issues. It can be inconsistent because different reviewers might have different opinions, it's subjective meaning personal biases can play a role, and it's hard to scale up to handle the increasing number of papers being submitted. Basically, it's a system struggling to keep up with the demands of modern science.
What's the solution?
The researchers created ReviewerToo, a framework that uses AI, specifically a large language model called gpt-oss-120b, to help review papers. They tested it on a large set of real submissions and found the AI could predict whether a paper should be accepted or rejected with accuracy comparable to human reviewers. They also had another AI judge the quality of the AI-generated reviews and found them to be pretty good, though not quite as good as the best human reviews. The system allows for different 'reviewer personas' and focuses on specific evaluation criteria to make the process more systematic.
Why it matters?
This work is important because it suggests AI can be a valuable tool to improve the peer review process. It can help make reviews more consistent, cover more ground, and potentially be fairer. While AI isn't ready to replace human experts, especially when it comes to judging truly novel ideas, it can assist them by handling tasks like fact-checking and ensuring thorough literature reviews. This could ultimately help scientific publishing keep pace with the rapid growth of research.
Abstract
Peer review is the cornerstone of scientific publishing, yet it suffers from inconsistencies, reviewer subjectivity, and scalability challenges. We introduce ReviewerToo, a modular framework for studying and deploying AI-assisted peer review to complement human judgment with systematic and consistent assessments. ReviewerToo supports systematic experiments with specialized reviewer personas and structured evaluation criteria, and can be partially or fully integrated into real conference workflows. We validate ReviewerToo on a carefully curated dataset of 1,963 paper submissions from ICLR 2025, where our experiments with the gpt-oss-120b model achieves 81.8% accuracy for the task of categorizing a paper as accept/reject compared to 83.9% for the average human reviewer. Additionally, ReviewerToo-generated reviews are rated as higher quality than the human average by an LLM judge, though still trailing the strongest expert contributions. Our analysis highlights domains where AI reviewers excel (e.g., fact-checking, literature coverage) and where they struggle (e.g., assessing methodological novelty and theoretical contributions), underscoring the continued need for human expertise. Based on these findings, we propose guidelines for integrating AI into peer-review pipelines, showing how AI can enhance consistency, coverage, and fairness while leaving complex evaluative judgments to domain experts. Our work provides a foundation for systematic, hybrid peer-review systems that scale with the growth of scientific publishing.