introducing lean-verifier-env, a lean 4 theorem-proving RL environment for verifier-backed proof training.
built on
@PrimeIntellect environments,
@willccbb's verifiers stack, and
@axiommathai's axle for hosted proof verification.
env link:
app.primeintellect.ai/dashbo…
the overall idea is to use lean as a form of training to learn not only general strategies for proving, but also the structured knowledge contained within traditional formal mathematicians, and ultimately to develop more autoresearch-style systems that can generate, decompose and verify proofs autonomously for unsolved problems. a lot of the inspiration for this work came from listening to recent podcasts of
@CarinaLHong and thinking how difficult it is to actually crack this domain, and honestly i love math man.
so the setup is simple like any other env: the model receives a lean theorem ending in := by, returns a proof, and then env checks it through either hosted axle or a local lean backend (mathlib lean4 leanprover-community/repl binary). the reward is 1.0 only when the submitted proof verifies.
for now, the progress i've made is:
-> supported 5 modes: single proof submission, repair with lean diagnostics, agentic lean_check(proof) tool-use, tactic-by-tactic step mode, and rlm-style recursive proof decomposition (iffing on the new RLMEnv type). (more info in the readme)
-> lean4 mathlib are embedded into a prime sandbox image for rlm mode, so the root model can decompose the theorem, call sub-models, verify snippets in-sandbox, and assemble the final proof.
-> axle is used for hosted smoke tests / quick checking (as we get only 20 calls/day), while the local lean backend is the intended training path since it avoids hosted checker limits and scales with cpu workers.
-> added anti-cheat checks around sorry, admit, local axioms, unsafe, and metaprogramming. the local backend also runs a post-verification axiom audit.
-> includes dataset presets around toy tasks, lean workbook, minif2f, proofnet, and putnam. and for the leaderboard pass@1 and pass@k
the next thing i want to run is the full hosted rlm leaderboard and the prime-rl run.