Gene Kim

Gene Kim

3,067 Photos and videos

Tweets

Pinned Tweet

Gene Kim

@RealGeneKim

6 Dec 2019

Holy cow. The Unicorn Project is on the Wall Street Journal bestseller lists!!! #2 in Hardcover Business category! And astonishingly, it’s also #8 across all Non-Fiction E-Books!!! A DevOps book!! 🤯🤯🤯 🙏❤️🦄🌈 Paywall: wsj.com/articles/best-sellin… #UnicornProject

670

Alex Albert

Gene Kim retweeted

Alex Albert

@alexalbert__

Jun 12

Fable feels superhuman at working over long agentic conversations, sometimes to the point where I can't keep up with what it's telling me 😅 This prompt snippet has been the best fix I've found for getting it to write clearly and drop any jargon:

1,022

62,571

Erik Meijer

Gene Kim retweeted

Erik Meijer

@headinthebox

Jun 12

Still one of the best papers on language design.

193

12,494

swyx

Gene Kim retweeted

swyx

@swyx

Jun 8

It's finally out!!! @METR_Evals found that more than half of SWEBench results is unmergeable slop. FrontierCode represents over 1000 hours of maintainer validated software engineering work most frontier models cannot yet solve, much less solve with high quality. Cog had IOI Gold medalists and top code maintainers Look At The Data — FrontierCode includes 3000 rubrics covering code quality and anticheat reward hacking plaguing other benchmarks. FC Diamond is so hard that Opus 4.8 scores 13.8%. Three eras of AI coding : Three eras of benchmarks 2021 • Autocomplete : HumanEval 2023 • Passing Tests: SWEBench, TerminalBench 2026 • Maintainable Code: FrontierCode to me the most beautiful chart when I requested a special historical run into all extant old models, the data was finding that the easiest third of FC tasks (in FC Extended) were rapidlly and suddenly solved over late 2025 - Opus almost doubled from a 41% pass rate to 74% in 4 months. This describes the "WTF happened in Dec 2025" vibe shift that a lot of folks from @dhh to @karpathy have called out: it is the difference between getting 95% success in 2 rerolls vs 6, making it finally feasible to go up the next layer of abstraction in agentic coding, eg @GeoffreyHuntley's ralph loops or @bcherny's /goals or @steipete's "loops that prompt your agents" without fearing too much that things go off the rails. My guess: as AI accelerates from here, each FrontierCode tier will saturate in sequence, hopefully ~annually. I've already asked the team to prepare FrontierCode 2027.... The old mountains will be destroyed. Their rubble becomes regolith. And from that regolith, the next model forest grows. Circle of life.

Cognition

@cognition

Jun 8

Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40 hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

785

187,766

claire vo 🖤

Gene Kim retweeted

claire vo 🖤

@clairevo

Jun 10

oh ok i get it. imagine you have a staff engineer working on a basic CRUD saas. and they complain year after year you're not working on enough tech debt and nothing is ever prioritized right. and then they propose the most opaque, overwrought example of backend performance issues, and try to convince your ceo that you should stop all product development and put 100% of your engineers on fixing your flaming dumpster fire of a monorepo. and every year you say no, because their proposals are intractable, completely untied to business metrics, and "no customer will care." now pretend you can put that person in a back room and ignore them for a few days, and they come back and secretly ship everything that was wrong in your repo. your app is now blazing fast and you are at bug zero. that is how you should use fable. (just don't put him in a meeting, and don't let anyone read his docs)

291

25,997

John Scott-Railton

Gene Kim retweeted

John Scott-Railton

@jsrailton

Jun 10

NEW: malware developers added nuclear & biological weapons text to to their spyware. Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner. Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky. When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit. We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted. In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation. H/T to colleagues that shared this with me socket.dev/blog/mini-shai-hu…

226

2,153

12,636

1,542,935

Richard Seroter

Gene Kim retweeted

Richard Seroter

@rseroter

Jun 10

"Released under an Apache 2.0 license, this 26B Mixture of Experts (MoE) model moves beyond the sequential token-by-token processing ... [generating] entire blocks of text simultaneously, delivering up to 4x faster text generation on GPUs."

Google Gemma

@googlegemma

Jun 10

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

0:05

1,653

Pankaj

Gene Kim retweeted

Pankaj

@the2ndfloorguy

Jun 10

i hooked my whoop to my work calendar to find which coworker gives me the most stress 🚨 thanks to fable, I reverse engineered whoop to pull per minute heart rate. nd matched spikes with cal events and attendees I now have a leaderboard and I think about it daily. few info masked for obvious reasons ;)

1,007

2,840

44,943

11,004,516

Gergely Orosz

Gene Kim retweeted

Gergely Orosz

@GergelyOrosz

Jun 8

Here's what triggered to write the Trimodal nature of software engineering compensation: At a time when the median salary for senior devs in the Netherlands was ~€65K, I made €240-280K per year, about 5x that amount. Most people did not know this tier existed back then:

544

110,275

Steve Yegge

Gene Kim retweeted

Steve Yegge

@Steve_Yegge

Jun 9

I've had some amazing consulting gigs this year, and loved visiting the companies and meeting everyone. I'm doing more advising and consulting now, and I needed a website. So I made yegge.ai as a place for all my old blogs and stuff. Haiku pretty much one-shotted it. I'll keep tinkering with it, but I welcome early feedback. My site was inspired by paulgraham.com and johnwinsor.com, but with my own style. Fun project.

Steve Yegge — Programmer

Engineering-culture expertise from forty years across Amazon, Google, and Grab. Independent advisor on AI-native transformation. Co-author of Vibe Coding (with Gene Kim, IT Revolution 2025).

yegge.ai

118

14,510

Erik Meijer

Gene Kim retweeted

Erik Meijer

@headinthebox

Jun 9

When companies like Meta or Microsoft lay of experienced people or make their lives so miserable that they voluntarily leave, they are unknowingly erasing institutional memory. Every company's infrastructure is chock full of Chesterton's fences and stays up because of unwritten processes. The code bases are typically huge, so nobody can understand them end-to-end. When the people that know the history are gone, the only thing the remaining crew can do is fight fires and make cosmetic changes at the edges. Rewriting from scratch is not an option since, as your favorite AI will say, the software has become load bearing for the company's success. The good news is that I think for green-field code, vibe coding offers a way to avoid this trap by embracing the invariant that all software should be evolvable without any human needing, or even be allowed, to understand the inner workings of it. Only if you take the human developer completely out of the loop like this, you can ensure that software never atrophies into legacy software. This may sound radical to AI coding deniers, but the reality is that code inevitably becomes incomprehensible for humans, so the rational solution is to just assume it is like so from the start.

118

15,182

Art Smalley

Gene Kim retweeted

Art Smalley

@Art_Smalley

Jun 9

I spent this morning with building-products supervisors learning to problem-solve with AI. Not elite knowledge workers. Not side-project entrepreneurs. They are knocking out the root cause of tough problems that have existed for years. AI can develop people across an organization, not just empower a select few at the top. If history teaches us anything about AI and the workforce, it won't be one-size-fits-all. Two different articles this morning helped me think through why. The NYT assembled a strong panel to ask "who actually thrives in a hybrid AI workforce?" - @DAcemworku (MIT, Nobel laureate), @emollick (Wharton, author of "Co-Intelligence"), @clarashih (former AI exec at Salesforce and Meta), and @DeanWBall (Foundation for American Innovation). It was a good albeit somewhat academic discussion. But the answers mostly land on curious generalists, side projects, and learning to manage AI agents. The panelists may be right - in pure cognitive work there will be a class of extremely bright people who think AI-native and extract the most returns. Another FT piece sent to me by Prof. Dan Jones by @VivienneMing, a theoretical neuroscientist, gets at something the NYT panel didn't touch. She put EEG headsets on students using AI. Most brains went quiet within minutes - the gamma waves that mark real cognitive effort collapsed. A few lit up. The difference? They were arguing with the machine, pushing back on its answers instead of accepting them. That's a learnable skill, not an innate trait. It is similar to how we were problem solving this morning. I think we'll see both futures. Some companies will have a small AI-native elite running agents while everyone else follows instructions. Others - like Toyota and Denso have done with problem-solving for decades - will teach their people to use AI effectively at every level. That's what I call Lean AI. The question isn't who's smart enough. It's which organizations develop everyone. Humans AI > Problems

680

Richard Seroter

Gene Kim retweeted

Richard Seroter

@rseroter

Jun 8

"One of Spotify’s oldest engineering principles is: 'The fewer technologies we are world-leading in, the faster we go.'" Good wisdom from @SpotifyEng. Standardize on a few things, build expertise, and eliminate unnecessary decision-making. engineering.atspotify.com/20…

3,408

Angie Jones

Gene Kim retweeted

Angie Jones

@techgirl1908

Jun 8

Karpathy's LLM wiki has been surprisingly effective as a memory layer for my agents. I wrote up how I'm using it for 5 types of memory. aaif.io/blog/karparthys-llm-…

Karparthy's LLM Wiki as Agent Memory - Agentic AI Foundation (AAIF)

At work, I’m building agents to handle various operational tasks and have found Karparthy’s LLM Wiki design to be an excellent solution for implementing most types of memory for my agents.The LLM...

aaif.io

141

14,816

Phil Venables

Gene Kim retweeted

Phil Venables

@philvenables

Jun 7

The latest biannual Benedict Evans presentation on "AI eats the world" is up, and it's great as ever. For me the most salient slides are: ben-evans.com/presentations

5,827

Bill Staples

Gene Kim retweeted

Bill Staples

@bstaples

Jun 4

I expected Duo Agent Platform to beat Cursor, Claude, Copilot and Devin on price with our $0.25/review fixed price, I did not expect us to win all of the above on precision and recall, but check out this site to learn more, including how we rank against all the code review solutions in market. duo-review-bench-6f7260.gitl… This is the power of repo-side agents, and it’s just the beginning of more powerful, higher quality and lower cost agentic engineering. This is our structural advantage in action.

Review every MR for $0.25 | GitLab Duo Code Review

AI code review on every merge request for a flat $0.25. Top 3 on a public benchmark, validated on a second.

duo-review-bench-6f7260.gitlab.io

7,920

Angie Jones

Gene Kim retweeted

Angie Jones

@techgirl1908

May 28

For software companies building AI apps and agents, GTM is part of the product work too. Microsoft launched an AI Agent Go-to-Market Playbook that walks software teams through turning their agents into real products and publishing them where Microsoft customers are already looking. The playbook covers: ✅validating the idea before overbuilding ✅designing the MVP with real cloud architecture ✅deciding how to package it ✅publishing through Microsoft Marketplace (with 6M monthly visitors) If your team is trying to ship agentic systems that customers can actually use this, this is worth a look: fandf.co/4wE59RT In collaboration with @msdev @Microsoft

4,982

Devin Dickerson

Gene Kim retweeted

Devin Dickerson

@Dev_TheAnalyst

May 27

I missed this brilliant piece from @unmeshjoshi on the nature of coding and the power of abstractions and vocabulary in the LLM age. martinfowler.com/articles/wh…

What Is Code?

What is code actually for in the era of LLMs.

martinfowler.com

15,495

David Fowler

Gene Kim retweeted

David Fowler

@davidfowl

May 24

People that are building real things are all coming to this conclusion. You could argue that it’s because software engineers care about the code quality more than they should, but it’s really because if you don’t, you will get up with software that does not work well.

Lee Robinson

@leerob

May 24

You might believe you should spend less time thinking about code because of AI. I strongly disagree! We’re watching this play out live where tons of AI generated code becomes a liability. At the end of the day, an engineer needs to be responsible / on call for code that gets shipped to production. If you don’t understand the system you’re trying to debug, you’re probably going to have a bad time. Yes, AI can help with all of this, if you set up the proper systems. You can have agents triage prod logs, look at errors, etc. You can speed up parts of the investigation, but an engineer needs to make the call. There might be serious customer or financial implications from that change. I expect the trend continue for trimming dependencies, vendoring code so you can modify it directly, preferring simpler systems with fewer abstractions, and spending waaaay more time thinking about system design and code maintenance. I’ve said this before, but it’s a great time to get familiar with CS fundamentals and some of the history behind what great software looks like. Many parts will be different in the coming years as AI progresses, but also a lot more than people realize will stay the same.

358

83,167

Asukiko

Gene Kim retweeted

Asukiko @asukiko_f

May 25

Replying to @crvvdev

Hey Ricardo, have you seen the Airbus analysis. Of this topic ? I believe they were the first one to talk about it on a public talk.. but not sure. And sometime ago some stuff of warbird get leaked as well on reddit.. github.com/airbus-seclab/war…

GitHub - airbus-seclab/warbirdvm: An analysis of the Warbird virtual-machine protection for the...

An analysis of the Warbird virtual-machine protection for the CI!g_pStore - airbus-seclab/warbirdvm

github.com

2,340

Ricardo Carvalho

Gene Kim retweeted

Ricardo Carvalho @crvvdev

May 23

Did you literally know that Windows has something called Warbird that literally executes encrypted shellcode on your computer? And that all of its functionality is not really known, we just know that exists and is actively running in everyones computers?

959

88,129