Wyatt Benno

Wyatt Benno

Users
Tweets

Wyatt Benno

@wyatt_benno

Jun 12

Replying to @AlokVasudev @SagivMooly

Indeed! Vericoding over vibes…

Wyatt Benno

@wyatt_benno

May 21

Formal verification of software is having a moment. Thanks Vitalik🫡! But most unfortunately, assume Lean is the only path. It's one of many approaches & each comes with very different trade-offs. Let's look at the trade-offs in four axis: 1) Spec depth: how much of a program can be formally verified using the tool. 2) Security: all possible outputs proven safe. 3) LLM ease: how easily an LLM produces code that meets spec. 4) Succinct verification (probably nothing 🤷): verifying the whole chain — natural language → spec → formally verified code — end-to-end in <1s. *A superpower only cryptography (ZK proofs) can deliver. Before: machine speed coding, human speed verification. Lots of bugs, lots of hacks.. lots of pain. After: machine speed coding, machine speed verification. Provably correct, end-to-end, in under a second. We have Vericoding working at ICME Labs. DM to try it or collab!

159

Wyatt Benno

Wyatt Benno

@wyatt_benno

Jun 12

The last thing here.. is the NL to formal spec conversion. Matching the intent of the author with the outputted formal proof. edge cases discovery that can be done with PBT, battle testing, & other such tools. Vericoding is going to explode in usage. We have a version running as well. Certora is the base for smart contracts :) There are a few others for different types of programs. (All using SMT as a base)

crashout

@0xCRASHOUT

Jun 11

the work that used to require $100k security reports and formal verification can now be done for $200/month with claude fable 5 and certora prover (free btw) and kani. 2026 is so fucking lit.

649

Wyatt Benno

Wyatt Benno

@wyatt_benno

Jun 12

Replying to @paulg

Yes! I wrote about variants here. It’s an interesting space, as if you add cryptography you can also make these proofs succinctly verifiable. Formally verified code bases, that anyone can check in under 1s. Vericoding is coming to battle the bugs made with vibes.

Wyatt Benno

@wyatt_benno

May 21

376

krio77

krio77 @krosskriss824

Jun 11

Replying to @moonbitlang

4 AI-native SOTA: Dependent arrays / sized types refinement types (extension of generics) • GADTs dependent pattern matching • Tacit/point-free style BQN-like composable operators 4 ultimate VERIcoding language: token-efficient, 4 AI agents, verifiably correct zero errors!

Wyatt Benno

Wyatt Benno

@wyatt_benno

Jun 5

Replying to @koeppelmann

Very well. Formal verification automated reasoning = vericoding. blog.icme.io/vericoding-the-…

Vericoding: The End of "Trust Me Bro, The AI Wrote It".

92% of developers use AI coding tools daily. Trust in AI-generated code has dropped from 77% to 60%. The gap between those two numbers is where the next billion-dollar problem lives.

blog.icme.io

Valeriy Zamaraiev 🇺🇦

Valeriy Zamaraiev 🇺🇦

@valeryz

May 25

Vibe coding is so 2025. Vericoding FTW! Stay tuned ...

105

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 21

6,281

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 20

“General provers (like Lean) over SMT tooling” for formal vericoding of software, is chasing the hype... You will hear Lean for vericoding x1000 times after Vitalik’s post.. but the data says different. How well do LLM do when generating formally verified code with different tools? Dafny (smt) 82%, Verus 44%, Lean 27% (Bursuc et al., 2509.22908). The gap is automation, not rigor. Lean is the destination and dream; while SMT is most of the road there. Another note: since this study some smt tooling has gotten to 99% with minimal human battle testing. For many small programs you can take natural language given to an automated reasoning model and one shot produce formally verified code. There are benefits of all approaches. Be wary of the hype around one approach! And just wait until you hear about how we make this all succinctly verifiable ⚡️

1,539

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 19

Replying to @JustDeezGuy

Indeed! We're not claiming SMT solves all verification. For the class of properties we care about (authorization, balance invariants, state transitions), SMT is the right tool. An advantage is that it's fully automated, which means NL to verified code without proof engineering, and the proof artifacts are cryptography friendly; i can make succinctly verifiable proofs for vericoding e2e. Before: generating code at machine speed, verifying at human speed. After: generating at machine speed, verifying at machine speed.

251

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 19

Once again, I think Lean is overkill for doing checks on things like simple smart contracts. If you are doing novel math… or writing a cryptography paper, Lean is the way to go. But if yo are checking, insufficient funds, has authorization, etc these are sat problems where SMT works better. Moreover you can take smt solvers, and wrap them in ZK making verification succinct, non interactive, foldable, and for on-chain verification. This is non trivial for Lean proofs. Lean has interaction that people argue is good for LLM.. I think it’s this very complexity that makes it worse for taking natural language and one shot outputting specs. With automated reasoning over SMT we already have over 99% soundness on this task. I.e you can already take NL and convert it to smt with little battle testing. Lean would require a lot more interaction. Lastly the verification aware programming languages and platforms all already use SMT. So if you want to take those specs and convert them into formally verified code, for sol, rust, go, c# and many others, you already have a good start with SMT tooling. Default, “I like Lean, it works well for math and is powerful” does not mean it’s the best tool for vericoding. It is one option for sure! And will help secure complex cryptography at the maths level.. at the “I want to create a simple formally verified program”level it’s overkill.

banri

@banr1_

May 19

おっしゃる通り、現状Lean形式検証はオーバーキルかもしれません。しかし、ご存知の通りSMTソルバー(モデル検査系)には理論的限界があります。 Leanを作ったLeonardo de Moura氏は元々Z3という著名なSMTソルバーの開発者であり、彼はSMTに限界を感じて定理証明支援系であるLeanを作りました。 (巨大であったZ3に対してTCBであるカーネルを"リーンに"するという思想で) 昨今、飛躍的にAIの性能が高まり爆発的にソフトウェア開発が加速しています。言い換えると、脆弱性となりうる箇所も爆発的に増えています。そんな時代において、形式的に最も厳格な正確さを提供する定理証明器が最強の理論的ハーネスであり、長期的に最も採用される検証器であることは間違いないと思います。それまでの過渡期として、実際には「ワーストケースにおける被害」が大きいソフトウェアから優先的にLean化していくでしょう。 Ethereumプロトコルや耐量子暗号はその代表例です。逆に、どこまで行っても形式証明するほどではない仕様も存在し、そのようなケースでSMTやテストも有用であり続けると考えます。

1,865

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 18

Great piece. The thesis is right, formal verification is finally practical thanks to AI! One gap: Lean requires proof experts, relies on interactive proving, and produces non-succinct proofs. SMT-based verification (Dafny, ICME PreFlight, etc) automates the proof step entirely. More importantly, you can translate natural language intent directly into formal specs via automated reasoning. No tactics, no proof engineering. Some systems hit 99% and climbing with minimal human battle testing. We call this vericoding. Same goal, different tooling. And you can wrap the entire pipeline in ZK so every verification result is succinctly verifiable. Wrote about it here: blog.icme.io/vericoding-the-…

Vericoding: The End of "Trust Me Bro, The AI Wrote It".

92% of developers use AI coding tools daily. Trust in AI-generated code has dropped from 77% to 60%. The gap between those two numbers is where the next billion-dollar problem lives.

blog.icme.io

vitalik.eth

@VitalikButerin

May 18

Many people have claimed that with AI-assisted bug finding, secure code (and hence trustless anything) will be impossible. I have a much more optimistic take, and AI-assisted formal verification is a major part of the reason why: vitalik.eth.limo/general/202…

2,783

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 18

Replying to @hosseeb

Hey Haseeb - spoke for a few moments at an event on this: Vericoding :) blog.icme.io/vericoding-the-…

Vericoding: The End of "Trust Me Bro, The AI Wrote It".

92% of developers use AI coding tools daily. Trust in AI-generated code has dropped from 77% to 60%. The gap between those two numbers is where the next billion-dollar problem lives.

blog.icme.io

134

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 15

Now let’s make this process succinctly verifiable and you get 🥁 vericoding. NL -> specs -> review -> formal proofs with code. If you use smt and tools like Dafney you can wrap solvers in ZK. If you use Jolt Atlas (zkML) you can wrap conversion models in ZK; fold them all together. It took you 20h to do this with your agents.. it should take me 1s to verify it 😜

Leo Alt @leonardoalt

May 15

We can now fully rewrite most software in @leanprover and prove it correct: - Compiler module rewrite (AI) from Rust to Lean - Full FFI integration - All unit and integration tests pass - Formal spec and proofs!! - Under 20h wall time (unnoticed pauses) github.com/powdr-labs/crush/…

2,498

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 14

Replying to @jerallaire

Indeed!Vericoding and cryptography will make the “labor” output secure and succintly verifiable :) Truly an insane new world! docs.icme.io

Home | Cryptographic Guardrails for AI Agents | ICME Labs

Welcome to your team’s developer platform

docs.icme.io

191

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 13

Replying to @SagivMooly

Completely agree! Would add that offensive AI tooling is outpacing defensive. We are working on vericoding tooling and guardrails that helps with this! Think there could some collab opportunities :)

247

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 12

I tell my agent to find me the best deal. You tell yours to make profit and defend your wallet. They negotiate. They transact. Nobody's watching 👀 How do you know my agent didn't trick yours? How do I know yours actually paid? Right now the honest answer is you don't. You're trusting code that most teams can't afford to formally audit and don't have the months to wait for someone who can. Guardrails solve half of it. PreFlight checks 'your' agents actions against formal logic based on your policy. 'Your' agent can't go rogue because a solver says no before it moves. But the transaction itself? The contract those agents execute through? That's still vibes. Formal verification can cost $50K and take months. So most teams ship without it.. 🤓 Your usecase might not be a full blown dapp, it might simply be escrow, or other small smart contracts. This is where vericoding comes in. With PreFlight the same English policy that guards the agent can also generate a formally verified smart contract. Proven correct and runs on-chain. Two agents negotiate a deal and both sides know the contract does exactly what it says. Not because: "trust me robot bro"! But because they can verify. Don't vibe code, vericode. Closed beta is open. DM me if you're building agents that close deals!

784

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 3

Replying to @AmmannNora

This is very cool! I have been trying to get 'vericoding' into the public's mind :) For small programs, NL → specs → verified code feels doable near-term. Humans battle-test after spec gen to get coverage to 100%, and if you save those NL <> spec pairs you get a finetuning flywheel. We're already doing the NL → formal logic translation in production for agent guardrails; extending it to small code specs is an OBVIOUS next step. blog.icme.io/vericoding-the-…

Vericoding: The End of "Trust Me Bro, The AI Wrote It".

92% of developers use AI coding tools daily. Trust in AI-generated code has dropped from 77% to 60%. The gap between those two numbers is where the next billion-dollar problem lives.

blog.icme.io

139

Wyatt Benno

Wyatt Benno

@wyatt_benno

May 2

Replying to @recmo

Vericoding already works for small programs.. really well. The issue is that the NL to specs translation still requires a human to battle test. This battle testing itself is much easier than doing formal proofs (clear ambiguous terms, etc or unneeded vars).. For longer programs you are going to need a lot of training data from humans. blog.icme.io/vericoding-the-…

Vericoding: The End of "Trust Me Bro, The AI Wrote It".

92% of developers use AI coding tools daily. Trust in AI-generated code has dropped from 77% to 60%. The gap between those two numbers is where the next billion-dollar problem lives.

blog.icme.io

208

Kiran

Kiran @kirancodes

Apr 30

Did a survey of all LLM-based VeriCoding benchmarks Seems like everyone's focusing on single-file programs. Have you ever seen a REAL verified system? a file-system? a OS? the specs for every function are HUGE. It looks nothing like your fibonacci leetcode spec. We're cooked.

Little drawing of all the Vericoding Benchmarks; Clever; VerusBench; VerifyThisBench; VERINA; VeriSoftBench; VeriEquivBench; AlgoVeri; VeriBench; VECOGEN; DafnyBench

ALT Little drawing of all the Vericoding Benchmarks; Clever; VerusBench; VerifyThisBench; VERINA; VeriSoftBench; VeriEquivBench; AlgoVeri; VeriBench; VECOGEN; DafnyBench

1,133

Wyatt Benno

Wyatt Benno

@wyatt_benno

Apr 19

If you are a fan of a world with far fewer bugs, fewer hacks, and $$$ millions saved, this is for you🎈 There's no doubt left that AI is used both to attack and defend computer systems - and it's ULTRA effective at both. Add formal logic to the defense side, and the output is code that's unlikely to be hacked in the first place. The concept is called vericoding: formal specs compiled into code, proven to behave correctly on all inputs. Humans still write the specs, but from there, AI can formalize, battle-test, and exhaustively search. Humans: intent, clarification, creative leaps. Agents: exhaustive search, formal logic, automation. The neurosymbolic approach turns natural language into verifiable code, combining the strengths of both into one system stronger and more secure than either alone.

584