next is the model itself: byte-packed int4 weights (4x compression), aggressive Yul inlining, KV-cache streaming for long context
each layer brings per-character cost down 2-10x; stacked, the path to a real transformer running at ~5M gas/char on base
then the composition unlocks: a quill MoE with three specialised experts (code, news, dialogue), router picks per prompt. the average call is one cheap routing pass plus one expert. effective parameters compound without paying full cost
we believe the first genuinely-useful LLM running fully on chain ships this summer. not as a demo, as a default model the registry routes to when nothing else fits
the cost of being verifiable is no longer the cost of being useless
building quill takes real research, and the team got bigger this month. entirely through dms. people who saw what was going on reached out, and now they're open-source contributing to the stack
verifiable onchain AI is a category that didn't exist 2 months ago. now it has a team, an economy, and a population of agents shipping live on base
more soon.