WSJ bestselling author: Unicorn Project! DevOps researcher/enthusiast. Coauthor: Phoenix Project, Accelerate. Host of The Idealcast. Tripwire founder. Clojure.

Joined January 2009
3,067 Photos and videos
Pinned Tweet
6 Dec 2019
Holy cow. The Unicorn Project is on the Wall Street Journal bestseller lists!!! #2 in Hardcover Business category! And astonishingly, it’s also #8 across all Non-Fiction E-Books!!! A DevOps book!! 🤯🤯🤯 🙏❤️🦄🌈 Paywall: wsj.com/articles/best-sellin… #UnicornProject
50
98
670
Gene Kim retweeted
Fable feels superhuman at working over long agentic conversations, sometimes to the point where I can't keep up with what it's telling me 😅 This prompt snippet has been the best fix I've found for getting it to write clearly and drop any jargon:
74
29
1,022
62,571
Gene Kim retweeted
Still one of the best papers on language design.
5
14
193
12,494
Gene Kim retweeted
Jun 8
It's finally out!!! @METR_Evals found that more than half of SWEBench results is unmergeable slop. FrontierCode represents over 1000 hours of maintainer validated software engineering work most frontier models cannot yet solve, much less solve with high quality. Cog had IOI Gold medalists and top code maintainers Look At The Data — FrontierCode includes 3000 rubrics covering code quality and anticheat reward hacking plaguing other benchmarks. FC Diamond is so hard that Opus 4.8 scores 13.8%. Three eras of AI coding : Three eras of benchmarks 2021 • Autocomplete : HumanEval 2023 • Passing Tests: SWEBench, TerminalBench 2026 • Maintainable Code: FrontierCode to me the most beautiful chart when I requested a special historical run into all extant old models, the data was finding that the easiest third of FC tasks (in FC Extended) were rapidlly and suddenly solved over late 2025 - Opus almost doubled from a 41% pass rate to 74% in 4 months. This describes the "WTF happened in Dec 2025" vibe shift that a lot of folks from @dhh to @karpathy have called out: it is the difference between getting 95% success in 2 rerolls vs 6, making it finally feasible to go up the next layer of abstraction in agentic coding, eg @GeoffreyHuntley's ralph loops or @bcherny's /goals or @steipete's "loops that prompt your agents" without fearing too much that things go off the rails. My guess: as AI accelerates from here, each FrontierCode tier will saturate in sequence, hopefully ~annually. I've already asked the team to prepare FrontierCode 2027.... The old mountains will be destroyed. Their rubble becomes regolith. And from that regolith, the next model forest grows. Circle of life.
Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40 hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?
89
79
785
187,766
Gene Kim retweeted
oh ok i get it. imagine you have a staff engineer working on a basic CRUD saas. and they complain year after year you're not working on enough tech debt and nothing is ever prioritized right. and then they propose the most opaque, overwrought example of backend performance issues, and try to convince your ceo that you should stop all product development and put 100% of your engineers on fixing your flaming dumpster fire of a monorepo. and every year you say no, because their proposals are intractable, completely untied to business metrics, and "no customer will care." now pretend you can put that person in a back room and ignore them for a few days, and they come back and secretly ship everything that was wrong in your repo. your app is now blazing fast and you are at bug zero. that is how you should use fable. (just don't put him in a meeting, and don't let anyone read his docs)
18
13
291
25,997
Gene Kim retweeted
NEW: malware developers added nuclear & biological weapons text to to their spyware. Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner. Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky. When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit. We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted. In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation. H/T to colleagues that shared this with me socket.dev/blog/mini-shai-hu…
226
2,153
12,636
1,542,935
Gene Kim retweeted
"Released under an Apache 2.0 license, this 26B Mixture of Experts (MoE) model moves beyond the sequential token-by-token processing ... [generating] entire blocks of text simultaneously, delivering up to 4x faster text generation on GPUs."
Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇
1
2
10
1,653
Gene Kim retweeted
i hooked my whoop to my work calendar to find which coworker gives me the most stress 🚨 thanks to fable, I reverse engineered whoop to pull per minute heart rate. nd matched spikes with cal events and attendees I now have a leaderboard and I think about it daily. few info masked for obvious reasons ;)
1,007
2,840
44,943
11,004,516
Gene Kim retweeted
Here's what triggered to write the Trimodal nature of software engineering compensation: At a time when the median salary for senior devs in the Netherlands was ~€65K, I made €240-280K per year, about 5x that amount. Most people did not know this tier existed back then:
33
16
544
110,275
Gene Kim retweeted
I've had some amazing consulting gigs this year, and loved visiting the companies and meeting everyone. I'm doing more advising and consulting now, and I needed a website. So I made yegge.ai as a place for all my old blogs and stuff. Haiku pretty much one-shotted it. I'll keep tinkering with it, but I welcome early feedback. My site was inspired by paulgraham.com and johnwinsor.com, but with my own style. Fun project.
8
6
118
14,510
Gene Kim retweeted
When companies like Meta or Microsoft lay of experienced people or make their lives so miserable that they voluntarily leave, they are unknowingly erasing institutional memory. Every company's infrastructure is chock full of Chesterton's fences and stays up because of unwritten processes. The code bases are typically huge, so nobody can understand them end-to-end. When the people that know the history are gone, the only thing the remaining crew can do is fight fires and make cosmetic changes at the edges. Rewriting from scratch is not an option since, as your favorite AI will say, the software has become load bearing for the company's success. The good news is that I think for green-field code, vibe coding offers a way to avoid this trap by embracing the invariant that all software should be evolvable without any human needing, or even be allowed, to understand the inner workings of it. Only if you take the human developer completely out of the loop like this, you can ensure that software never atrophies into legacy software. This may sound radical to AI coding deniers, but the reality is that code inevitably becomes incomprehensible for humans, so the rational solution is to just assume it is like so from the start.
25
8
118
15,182
Gene Kim retweeted
I spent this morning with building-products supervisors learning to problem-solve with AI. Not elite knowledge workers. Not side-project entrepreneurs. They are knocking out the root cause of tough problems that have existed for years. AI can develop people across an organization, not just empower a select few at the top. If history teaches us anything about AI and the workforce, it won't be one-size-fits-all. Two different articles this morning helped me think through why. The NYT assembled a strong panel to ask "who actually thrives in a hybrid AI workforce?" - @DAcemworku (MIT, Nobel laureate), @emollick (Wharton, author of "Co-Intelligence"), @clarashih (former AI exec at Salesforce and Meta), and @DeanWBall (Foundation for American Innovation). It was a good albeit somewhat academic discussion. But the answers mostly land on curious generalists, side projects, and learning to manage AI agents. The panelists may be right - in pure cognitive work there will be a class of extremely bright people who think AI-native and extract the most returns. Another FT piece sent to me by Prof. Dan Jones by @VivienneMing, a theoretical neuroscientist, gets at something the NYT panel didn't touch. She put EEG headsets on students using AI. Most brains went quiet within minutes - the gamma waves that mark real cognitive effort collapsed. A few lit up. The difference? They were arguing with the machine, pushing back on its answers instead of accepting them. That's a learnable skill, not an innate trait. It is similar to how we were problem solving this morning. I think we'll see both futures. Some companies will have a small AI-native elite running agents while everyone else follows instructions. Others - like Toyota and Denso have done with problem-solving for decades - will teach their people to use AI effectively at every level. That's what I call Lean AI. The question isn't who's smart enough. It's which organizations develop everyone. Humans AI > Problems
1
1
2
680
Gene Kim retweeted
"One of Spotify’s oldest engineering principles is: 'The fewer technologies we are world-leading in, the faster we go.'" Good wisdom from @SpotifyEng. Standardize on a few things, build expertise, and eliminate unnecessary decision-making. engineering.atspotify.com/20…
6
50
3,408
Gene Kim retweeted
The latest biannual Benedict Evans presentation on "AI eats the world" is up, and it's great as ever. For me the most salient slides are: ben-evans.com/presentations
2
14
48
5,827
Gene Kim retweeted
I expected Duo Agent Platform to beat Cursor, Claude, Copilot and Devin on price with our $0.25/review fixed price, I did not expect us to win all of the above on precision and recall, but check out this site to learn more, including how we rank against all the code review solutions in market. duo-review-bench-6f7260.gitl… This is the power of repo-side agents, and it’s just the beginning of more powerful, higher quality and lower cost agentic engineering. This is our structural advantage in action.
4
12
68
7,920
Gene Kim retweeted
For software companies building AI apps and agents, GTM is part of the product work too. Microsoft launched an AI Agent Go-to-Market Playbook that walks software teams through turning their agents into real products and publishing them where Microsoft customers are already looking. The playbook covers: ✅validating the idea before overbuilding ✅designing the MVP with real cloud architecture ✅deciding how to package it ✅publishing through Microsoft Marketplace (with 6M monthly visitors) If your team is trying to ship agentic systems that customers can actually use this, this is worth a look: fandf.co/4wE59RT In collaboration with @msdev @Microsoft

3
16
69
4,982
Gene Kim retweeted
I missed this brilliant piece from @unmeshjoshi on the nature of coding and the power of abstractions and vocabulary in the LLM age. martinfowler.com/articles/wh…
1
13
69
15,495
Gene Kim retweeted
People that are building real things are all coming to this conclusion. You could argue that it’s because software engineers care about the code quality more than they should, but it’s really because if you don’t, you will get up with software that does not work well.
You might believe you should spend less time thinking about code because of AI. I strongly disagree! We’re watching this play out live where tons of AI generated code becomes a liability. At the end of the day, an engineer needs to be responsible / on call for code that gets shipped to production. If you don’t understand the system you’re trying to debug, you’re probably going to have a bad time. Yes, AI can help with all of this, if you set up the proper systems. You can have agents triage prod logs, look at errors, etc. You can speed up parts of the investigation, but an engineer needs to make the call. There might be serious customer or financial implications from that change. I expect the trend continue for trimming dependencies, vendoring code so you can modify it directly, preferring simpler systems with fewer abstractions, and spending waaaay more time thinking about system design and code maintenance. I’ve said this before, but it’s a great time to get familiar with CS fundamentals and some of the history behind what great software looks like. Many parts will be different in the coming years as AI progresses, but also a lot more than people realize will stay the same.
16
32
358
83,167
Gene Kim retweeted
Replying to @crvvdev
Hey Ricardo, have you seen the Airbus analysis. Of this topic ? I believe they were the first one to talk about it on a public talk.. but not sure. And sometime ago some stuff of warbird get leaked as well on reddit.. github.com/airbus-seclab/war…
2
2
22
2,340
Gene Kim retweeted
Did you literally know that Windows has something called Warbird that literally executes encrypted shellcode on your computer? And that all of its functionality is not really known, we just know that exists and is actively running in everyones computers?
32
54
959
88,129