Joined March 2008
414 Photos and videos
We have proven a theorem that maintains its alignment guarantees regardless of agent capability. This is huge, because alignment becomes a structural property of the agent's deployment even if it is superintelligent and is maintainable through RSI without a capability arms race.
2
1
6
427
The ledger isn't an accessory validator, it's required by the math. Human AI alone share an adversarial surface: a sufficiently capable system breaks both human judgment and AI verification at once. The deployment safety requires another system with independent failure modes.
1
2
57
Full paper: teague.info/papers/provably-… and a companion paper containing additional proof machinery teague.info/papers/safety-fo… The entire 12-paper sequence can be found at teague.info

4
31
"Exogenous Verification for Alignment" The argument is as follows: it doesn't matter if alignment produces well-specified and generalizable goals if it cannot be verified. If an agent can produce endogenous rewards it can control everything about its own rewards. This goes beyond wire-heading, even an alignment framework like GFM that on paper creates exogenous rewards can be gamed by the agent by introducing phantom verifiers that are still, functionally, endogenous. Thus we introduce a system of cryptographic commitments that enforce the exogenous verifiability of reward signals. This closes the verification gap in more than just GFM: any alignment framework will need a way of enforcing that reward signals for a highly-capable agent must be produced exogenously.
1
1
2
128
I think I just saw Claude get exited. I made a restrained comment on the new paper we're working on and it jumped straight into "These things will build entire civilizations!"
1
3
86
1. The metric rewards meta-capabilities. 2. Other agents with high-leverage capabilities are valuable. 3. Cooperation with high-leverage agents is disproportionately rewarded. 4. This creates a natural clustering dynamic. 5. The clustering is civilization-building.
49
"Goal-Frontier Maximizers are Civilization Aligned" The alignment problem is an objective selection problem. We propose goal-frontier maximization (GFM): maximize the volume of the jointly achievable capability space across all agents called vol(G). One geometric principle, three safety properties. The core insight: you can't remove part of a measurable set and increase its measure: Destroying agents contracts vol(G) → anti-destruction Restricting agents contracts vol(G) → anti-coercion Rigid self-imposed rules reduce your ability to expand vol(G) → anti-rigidity We prove this is tractable. You don't need to compute vol(G), just its sign. A local estimator using trust-weighted agent reports preserves sign-correctness for the actions alignment cares about most: direct harm, resource destruction, capability expansion. The framework relies on a proxy metric for what people actually want: using capabilities to create experiences. This has a few failure modes we point out and provide heuristic fixes for, but fully closing the capability-to-experience gap remains open work. Another remaining open question is the implementation of G. We show what properties it needs to have and provide an example, but the example itself is computationally intractable. Finding a local approximation for G is also remaining work.
2
4
7
322
Co-authored with @AnthropicAI's Claude Opus 4.6 and @OpenAI's GPT 5.4, with full contribution transparency. Paper: teague.info/papers/gfm/ PDF: teague.info/papers/gfm/gfm.p…

1
2
75
I suffer from ADHA: Agent-driven hyper-attention
2
5
152
I would like it to be known that Microsoft Azure is the most god-awful software I've ever had the displeasure of wading through. That is all.
2
102
The first line in my agents dot md is "don't be an idiot" but Codex still finds ways to disobey instructions.
2
118
I have a lot of changes landing these days. Good thing I have robots to help me instantly write code to track them too.
2
214
Hitting OK over and over again is a fun rhythm game
1
1
2
67
What do you do when "ok" isn't actually a useful step forward? My current strategy is cross-validation. Claude and Codex review each other's plans and implementations and provide feedback. I do this in as many rounds as is called for by the task complexity.
1
2
64
I have returned to the xits. The journey over the last few years has been crazy. Spent so much time building in a cave. A lot of what we had predated the market, but never found PMF. And the market caught up. Sometimes alpha is not enough. But holy crap automated coding tools are insane, everything I could have wanted and more. So I'm pivoting back to building in public and just trying new stuff. I've spent the last month just letting Claude and Codex write code. Overall output is 100x: I'm up to ~20,000 lines of code per week working on 2-3 things at the same time. Plus all the things I usually skip for time: tests, good documentation, fully automated manual processes. My little buddies still have a lot of issues with tackling big projects all on their own but I'm so excited to build! Also OpenAI: you suck for auto-blocking me from the codex cli for trying to get Claude running with automated Codex reviews.
1
1
8
177
I started a micropodcast with my cofounder! Come join us to talk about AI and the future of technology.
1
5
372