GPT reviewer "scores above each paper's top-rated human reviewer" but AI review agents "overlap far more than humans do...and exhibit 16 recurring weaknesses humans do not share..." Results "position current AI reviewers as complements to, not substitutes for, human reviewers."
Seems GPT-5.2 reaches expert level in peer review: 45 scientists took 469 hours evaluating human & AI reviews on 82 papers.
"Surprisingly, current AI reviewers are competitive even with the top-rated reviewers in Nature’s official peer review..." though not without weaknesses.