The buried lede in this post is the October-to-December timeline.
In October 2025, Karpathy publicly said AI agents âjust donât work.â Eight weeks later heâs 80% agent-coded and calling it the biggest workflow change in two decades. Thatâs the fastest opinion reversal from one of the most credible voices in AI, and it maps to something measurable.
Stack Overflowâs 2025 developer survey tells the other side of this story. Only 16% of developers reported âgreatâ productivity gains from AI tools. 45% said debugging AI code takes longer than writing it themselves. Meanwhile, the Claude Code team is shipping 20-27 PRs per day, 100% AI-written.
Thatâs a bimodal distribution forming in real time. A small group is getting 10x leverage. The majority is getting modest autocomplete improvements and spending extra time fixing hallucinated code. Same tools, completely different outcomes.
Karpathy names the variable: âascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions.â Thatâs a skill that looks nothing like traditional programming. Youâre managing agents the way a senior engineer manages junior devs. Scoping work, reviewing output, catching failure modes, maintaining system-level context.
The 10x engineer ratio heâs worried about is already here. 25% of YCâs Winter 2025 batch shipped codebases that were 95% AI-generated, and every founder was fully technical. They werenât replacing skill with AI. They were compounding skill through AI. The gap between someone who can decompose a weekend project into agent-executable chunks and someone still typing code line by line is widening by the month.
What makes this post different from the usual AI hype: Karpathy is explicitly naming the failure modes. Needs high-level direction. Needs taste. Needs judgment. Works better for well-specified tasks. This is an honest field report from someone who mass-reversed his own position 90 days ago, and that combination of enthusiasm and specificity is what makes the signal real.
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didnât work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow.
Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: âHere is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for meâ. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didnât touch anything. All of this could easily have been a weekend project just 3 months ago but today itâs something you kick off and forget about for 30 minutes.
As a result, programming is becoming unrecognizable. Youâre not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now.
Itâs not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.