Last week on Happy Path Programming,
@JamesWard asked me whether AI helps with the proving side of Aver.
I said no. In a meaning that AI writes Aver itself, but is not helping you to prove it later in any way. Today I think that answer was wrong - and weirdly, this is one of the most enjoyable times I’ve been wrong in a long time...
In Aver, the same source file is both a program that runs and a set of laws exported to Lean 4 and Dafny/Z3. I’ve been growing the automatic prover by hand: adding strategies, improving induction cases, closing benchmark tasks one evening at a time. That works, but it scales with my evenings I'd rather spend with my kids on the playground. Meanwhile, LLM-driven proof search keeps improving. I was implicitly treating that as competition: better models generating more proofs, while I slowly grow a handwritten prover.
Today, in a long session with Claude Fable 5, I finally saw the thing I had missed up to this point: Aver already has most of the shape of a trustworthy proof-candidate filter. A candidate law can go through cheap deterministic tests, hostile boundary inputs, Dafny/Z3, Lean, axiom counting, `sorry` checks, and fail-closed gates. The model’s confidence does not matter. The candidate either survives the pipeline or it does not.
The only missing feature was almost hiding in plain sight... Aver’s `verify law` blocks are ALREADY A LEMMA LANGUAGE. But today the prover treats each law too much like a separate island. Proved laws should feed the lemma pool for later laws. The machinery is already close: last week I built lemma-pool plumbing for auto-discovered lemmas. The next step is to let already-proved user-written laws enter the same pool, with strict ordering and no cyclic dependency tricks, which opens a much more interesting loop.
An LLM can try to decompose a hard theorem into small helper laws, directly in Aver, never touching generated Lean or Dafny. Bad candidates usually die in milliseconds on fast VM tests. Survivors go through the real, more expensive verifier chain. Only proved lemmas become available to the next proof attempt. So the fancy new Claude Fable 5 did not give me a magic proof, but it made me notice that the useful thing here is the pipeline around proof attempts: cheap generation, brutal filtering, and a final verdict that does not depend on trusting the generator.
So I think my answer to James was wrong, but in a more interesting way than I expected: AI does not really help Aver’s proving side yet, but Aver may already have the right shape for making AI-written proof attempts useful without trusting them.