LLMs have become capable of proving complex mathematics. However, the proofs they produce vary significantly in how clear, motivated, and insightful they are.
To measure these differences, we introduce ProofRank, the first benchmark to scalably evaluate aspects of proof quality.