The paper builds a practical way to plug AI into peer review without replacing humans.
Finds a careful AI workflow can get near human accuracy while adding consistent, useful feedback at scale.
An AI metareviewer reached 81.8% accept or reject accuracy, close to the 83.9% human average.
The system, called ReviewerToo, reads a paper, checks the literature, collects multiple AI reviews, then writes a single verdict.
It uses reviewer personas like empiricist, theorist, and pedagogical, and each follows the same conference rubric.
An author agent drafts a rebuttal, and a metareviewer weighs the evidence, filters weak claims, and makes a recommendation.
They tested this on 1,963 real submissions from a major machine learning conference.
Single personas show bias, for example permissive leans accept and critical leans reject, so mixing them reduces that bias.
Ensembles and the metareviewer match decisions best and give more actionable feedback than many single reviews.
AI reviewers are strong at fact checking and literature coverage.
AI reviewers are weaker on judging real novelty and deep theory.
Rebuttals can make agents too positive, so the authors keep humans in charge of final calls.
----
Paper – arxiv. org/abs/2510.08867
Paper Title: "ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review"