Joined December 2007
926 Photos and videos
Pinned Tweet
📢 NEW essay: the narrative that AI is replacing software engineers seems to be based on AI-washing of layoffs. Among the many lines of evidence: New York State requires firms to disclose which layoffs were due to AI. When there are legal consequences to lying, almost no company blames AI. The best data suggests that software engineer employment in the U.S. is still growing, though slightly more slowly compared to a no-AI counterfactual world. But even this doesn’t account for increased entry into entrepreneurship. Why haven’t AI agents replaced software engineers? Many kinds of knowledge work, including software development, can be seen as a “decide-execute-deliver sandwich”. AI compresses the middle but the other two layers resist automation in a way that will not be overcome by capability improvements alone. In 2026, the middle has already been squeezed and there isn’t much room for further compression, so there isn’t a future capability leap that will cause discontinuous impacts, either. The essay also has a detailed breakdown of the difference between vibe coding and agentic engineering. Conflating the two has unfortunately led to a lot of confusion. We acknowledge that there’s a lot of uncertainty about the future. But we think the unknowns are more about how we will define and carve up roles and who can adapt to the changing needs, and less about whether software engineering skills and judgment will remain in demand (they will). Given that the AI-capabilities-cause-mass-layoffs narrative seems to be false even in a sector with very few regulatory barriers, we think most other professions are likely to be even more cushioned. (The claim isn’t that automation never displaces jobs, but that there are many downstream links in the causal chain, and that we have collective agency to act on those downstream decision points to ensure better outcomes for workers.) This essay by me and @sayashk is the first in a series. Feedback is welcome! We hope that our unique vantage point (leading an AI agent evals research group fluency with the labor economics literature the "normal technology" framework for understanding AI impacts being coders ourselves and well connected to the software engineering community) gives us a way to bring complementary perspectives together for a deep dive into how AI is transforming the profession, in a way that hasn't been done before. Read the essay here for lots of other details: normaltech.ai/p/why-ai-hasnt…
18
37
152
32,040
There’s a big, under-appreciated reason why people may have very different experiences and opinions about using AI for work — are they using it for tasks they’re already an expert at, or tasks they can’t do themselves? The former leads to a *growth cycle* and the latter leads to a *dependence spiral*. When I use AI to do something I’m an expert at, like coding, I treat it as a tool. I can build quickly, maintaining an understanding of the code, knowing that if necessary, I can fix the code myself. It feels empowering. It frees up my time to think about the complex, judgment-oriented parts of software engineering that I can’t or won’t delegate to AI. That means my own skills improve rapidly, and I get to climb the ladder of complexity and develop higher-level skills, much more so than when I write the code myself. I feel in control. I can lock in and achieve a flow state — when AI is working, I’m reviewing, building understanding, and planning the next steps. I never get the feeling that the tool is about to replace me. This is the growth cycle. (Of course, the growth cycle is not automatic. I still need to exercise agency to use AI responsibly. But it’s the same challenge with any productivity-enhancing technology, and those who’ve navigated such transitions before are well-equipped to navigate it with AI as well.) On the other hand, if I use it for tasks I don’t understand and haven’t learned to perform myself, I have no choice but to treat it as a superintelligence. If something breaks, the best I can do is ask AI to fix it and hope for the best. I generally can’t evaluate the quality of the output myself. The only way to find out if it's any good is if and when the work is ultimately reviewed by an actual expert. The experience is confusing, unsettling and disempowering. And forget about flow state. By over-relying on AI, I risk losing whatever skill I had at the task in the first place, even if it boosts productivity in the short term. This is the dependence spiral. It’s no wonder that entry-level workers and students preparing to enter the workforce find themselves in a bind. To compete with the AI-enabled productivity of more seasoned workers, they must adopt AI themselves, but doing so risks the dependence spiral. I have some thoughts on solutions that I will share in later posts, but I think having a clear diagnosis of the problem is a useful first step.
36
50
228
29,620
There are many simple things we can do to make the AI discourse less confusing and more productive, like not using "vibe coding" as an umbrella term to refer to all AI-assisted software development. normaltech.ai/p/why-ai-hasnt…
4
3
50
3,724
Highly recommended. Many people have a overly narrow view of knowledge work jobs and thus overstate the disemployment effects that AI is likely to create.
📢 NEW essay: the narrative that AI is replacing software engineers seems to be based on AI-washing of layoffs. Among the many lines of evidence: New York State requires firms to disclose which layoffs were due to AI. When there are legal consequences to lying, almost no company blames AI. The best data suggests that software engineer employment in the U.S. is still growing, though slightly more slowly compared to a no-AI counterfactual world. But even this doesn’t account for increased entry into entrepreneurship. Why haven’t AI agents replaced software engineers? Many kinds of knowledge work, including software development, can be seen as a “decide-execute-deliver sandwich”. AI compresses the middle but the other two layers resist automation in a way that will not be overcome by capability improvements alone. In 2026, the middle has already been squeezed and there isn’t much room for further compression, so there isn’t a future capability leap that will cause discontinuous impacts, either. The essay also has a detailed breakdown of the difference between vibe coding and agentic engineering. Conflating the two has unfortunately led to a lot of confusion. We acknowledge that there’s a lot of uncertainty about the future. But we think the unknowns are more about how we will define and carve up roles and who can adapt to the changing needs, and less about whether software engineering skills and judgment will remain in demand (they will). Given that the AI-capabilities-cause-mass-layoffs narrative seems to be false even in a sector with very few regulatory barriers, we think most other professions are likely to be even more cushioned. (The claim isn’t that automation never displaces jobs, but that there are many downstream links in the causal chain, and that we have collective agency to act on those downstream decision points to ensure better outcomes for workers.) This essay by me and @sayashk is the first in a series. Feedback is welcome! We hope that our unique vantage point (leading an AI agent evals research group fluency with the labor economics literature the "normal technology" framework for understanding AI impacts being coders ourselves and well connected to the software engineering community) gives us a way to bring complementary perspectives together for a deep dive into how AI is transforming the profession, in a way that hasn't been done before. Read the essay here for lots of other details: normaltech.ai/p/why-ai-hasnt…
1
3
32
12,570
Periodic reminder that the AI-as-Normal-Technology thesis acknowledges rapid progress in AI capabilities. Its main contribution is to show why the speed of impacts is not gated by the speed of capability progress. We are keenly interested in assessing how the core ideas stand up to the evidence. So far we think the thesis is overwhelmingly supported. Here's our latest deep dive: normaltech.ai/p/why-ai-hasnt… Of course, we are probably biased, so we'd love to see how others assess the evidence. Remember: simply gesturing at how weird it all feels doesn't count. (We, too, agree it all feels profoundly weird to live through! That's not the crux of disagreement.) We have an essay laying out our areas of common ground with the AI 2027 authors. Point #1 is that *so far* AI is behaving like a normal technology asteriskmag.substack.com/p/c… It is totally reasonable to believe that there will be some discontinuity *in the future*, maybe due to recursive self improvement. But there is no discernible discontinuity of impact so far. And if you think rapid capability progress by itself shows that AI is not normal technology, I encourage you to (re)read past the title of the essay :) knightcolumbia.org/content/a…
I still can’t believe that after all this progress, many people still earnestly believe AI is going to be a normal technology.
14
9
113
18,200
Arvind Narayanan retweeted
Spot on and likely true across industries (including/especially legal services): “The fact that aggregate labor demand in software is likely to remain strong doesn’t mean that most individual workers won’t be affected. We will argue that AI will create massive structural shifts in how software is produced, which will have big impacts on which software engineers stand to gain or lose — based on the types of firms they work in, their geography, their seniority, the pace at which they can adapt.”
📢 NEW essay: the narrative that AI is replacing software engineers seems to be based on AI-washing of layoffs. Among the many lines of evidence: New York State requires firms to disclose which layoffs were due to AI. When there are legal consequences to lying, almost no company blames AI. The best data suggests that software engineer employment in the U.S. is still growing, though slightly more slowly compared to a no-AI counterfactual world. But even this doesn’t account for increased entry into entrepreneurship. Why haven’t AI agents replaced software engineers? Many kinds of knowledge work, including software development, can be seen as a “decide-execute-deliver sandwich”. AI compresses the middle but the other two layers resist automation in a way that will not be overcome by capability improvements alone. In 2026, the middle has already been squeezed and there isn’t much room for further compression, so there isn’t a future capability leap that will cause discontinuous impacts, either. The essay also has a detailed breakdown of the difference between vibe coding and agentic engineering. Conflating the two has unfortunately led to a lot of confusion. We acknowledge that there’s a lot of uncertainty about the future. But we think the unknowns are more about how we will define and carve up roles and who can adapt to the changing needs, and less about whether software engineering skills and judgment will remain in demand (they will). Given that the AI-capabilities-cause-mass-layoffs narrative seems to be false even in a sector with very few regulatory barriers, we think most other professions are likely to be even more cushioned. (The claim isn’t that automation never displaces jobs, but that there are many downstream links in the causal chain, and that we have collective agency to act on those downstream decision points to ensure better outcomes for workers.) This essay by me and @sayashk is the first in a series. Feedback is welcome! We hope that our unique vantage point (leading an AI agent evals research group fluency with the labor economics literature the "normal technology" framework for understanding AI impacts being coders ourselves and well connected to the software engineering community) gives us a way to bring complementary perspectives together for a deep dive into how AI is transforming the profession, in a way that hasn't been done before. Read the essay here for lots of other details: normaltech.ai/p/why-ai-hasnt…
1
1
11
4,947
Arvind Narayanan retweeted
New preprint! We introduce a new benchmark, SciConBench, with 9.11k scientific questions derived from Cochrane Systematic Reviews. We find evidence that frontier AI agents **cannot** synthesize scientific conclusions well. A thread 🧵 w/ @hayounggjung, @korolova & others
10
51
190
35,960
The fact that I remain committed to the "normal technology" perspective for understanding AI's economic impacts doesn't mean I can't appreciate how profoundly weird it is to use AI on a day-to-day basis. Agents are designed behave and interact in a humanlike way, yet "happily" accept endless amounts of grunt work, which often reminds me of Douglas Adams' "cow that wants to be eaten".
3
7
72
7,170
Nice to see "churnalism" get automated. An agent cranked this out based on @sayashk's tweet. I hope journalists will leave behind this low-value stuff to AI and focus on the hard parts — digging up non-public info; verification and provenance; supplying unique analysis.
8
6
31
3,695
Arvind Narayanan retweeted
There is a lot of justified anger at Anthropic for sandbagging Fable 5 for AI development tasks. But an unanticipated side effect is that third-party evaluators can no longer credibly use the model for evaluations. Case in point: we are in the middle of running *really hard* AI R&D evaluations. Fable 5 would be a perfect test candidate. But because of Anthropic's guardrails, we can't know if the model failed or if their classifiers blocked the capability. By the way, this is not just true for AI R&D. Since Anthropic doesn't make it clear when they are sandbagging, this could seep into any number of technical tasks, and the evaluators wouldn't have any way to know. So they can't credibly claim to evaluate state-of-the-art accuracy using the model.
42
120
1,340
107,887
Arvind Narayanan retweeted
I’ve been meaning to write a tweet thread about a new paper— Barriers to Adopting Predictive Algorithms: A Criminal Justice Field Experiment. We built fancy ML sentence prediction software for public defenders and they mostly didn’t use it. We then pivoted to asking why not. 1/
Things look good for AI as a “normal technology” so far. LLMs have rapidly impacted coding, but few other professions; and even in coding it’s hard to see value generation. Bottlenecks, regulation, other frictions keeping humans complementary and relevant
6
22
90
34,480
This is a really excellent data analysis. One question it raises is how to square all the polling showing majority-negative sentiment toward AI with the finding in this post that there's a 3:1 ratio of AI-embracing to AI-resisting content (weighted by reach). A few possible ways: * Those who embrace AI are more likely to create / consume content about it than those who resist it (perhaps because AI itself makes it easier to make engaging content). * Polling reflects stated preferences, content creation/consumption reflects a mix of revealed and stated preferences, and usage reflects revealed preferences, so the three won't necessarily align. * There is actually no contradiction between adopting AI and resisting it. Many people are anxious or angry about AI's impacts, wish it didn't exist, and resent AI being shoved into everything, but at the same time see the benefits of using AI for entertainment or productivity.
The popular conversation around AI in America looks nothing like the narratives the elites are driving. For our new research, we analyzed 25,000 TikTok and YouTube videos about AI---and watched thousands of them ourselves---to understand how Americans are encountering AI in their everyday lives. Despite an elite conversation focused largely on backlash, AI videos embracing AI outnumber videos about resisting AI 3 to 1. These "adopter" videos don't focus on the things elites talk about: they talk about funny memes and effects AI can help make and ways you can use AI to help you with your job search. There is a significant and organized social media community focused on resisting AI, but surprisingly, it's not mainly about job loss, data centers, or existential risk. Instead, it's about creative theft and the erosion of human-made art. This has all the hallmarks of a genuine movement---with organized efforts to support human artists, to report AI-generated content, and to oppose the technology in the real world. All in all, when we look past the efforts of the labs and the media to impose a top-down narrative around job loss and existential risk, we find everyday Americans having a far different and in many ways more "normal" conversation (@random_walker)---one in which AI offers immediate and personal opportunities and challenges all at the same time. Check out the full research piece, which is loaded with interesting real example videos, here: freesystems.substack.com/p/m…
8
4
41
10,321
This is how so many data-driven companies screw their users and shoot themselves in the foot. My guess is every time Uber runs a 2-week A/B test where they give some users accurate time estimates and other users deceptive time estimates, the deceptive arm wins — many people probably compare Uber and Lyft time estimates and pick one, so lying about wait times increases the number of trips booked in the short run. But what the data won't show is that people gradually catch on to the deception over a period of months or years and lose trust in the company or the sector altogether, and use the app less or quit entirely.
Standard experience booking an Uber “5 minutes away” I order it “Finding your driver…” “Pickup in 7 minutes” I open my timer 9 minutes and 21 seconds later, it arrives This is so tedious and I am tired of being deceived at the margin
19
22
430
68,418
Arvind Narayanan retweeted
For all the attention AI gets, data shows that the parties are still not adopting it as a major issue. I've been playing with @derekwillis 's awesome data on party fundraising emails. As my plot here shows, AI is just starting to tick up as a Democratic talking point, but it's still very, very modest. The Republicans' 2020-era freakout about social media remains much, much larger than the current AI freakout in terms of email focus...and that never amounted to much in terms of policy. At the same time, policy proposals around AI are getting bolder, like Sanders and Trump floating national ownership---yet surprisingly, the party rank and file are actually moving pretty slowly to adopt AI as an issue, maybe because it's not very salient with the American public. Will be very curious how this changes in the coming months. Seems likely Democrats may start messaging around it more...but maybe not, if we don't start to see meaningful employment effects.
Present Trump on Air Force One taking to the press: Reporter: 'Sir, on AI companies, potentially taking these equity stakes, have you Spoken to Sam Altman or any of the-' President Trump: 'No, there's a concept out there, there's so much money, and it's so big, that there are concepts where pieces could be given to the American public. Where the American public essentially becomes a partner with the companies and, I will tell you, yeah, I have, I have spoken with all of them. There's something very interesting about it where it almost becomes a partnership with the American public. And we'll look into that. I actually have a meeting scheduled in the very short, in the very near future with, did you know that? With all of the companies, and, we're talking about it. Where the American people can benefit from the success of AI. And by doing that they're going to like it better. Because we're leading China. We're leading everybody in the world with AI. And we want to keep it that way. It's probably the biggest industry maybe that we've ever seen.' Reporter: 'Which companies- President Trump: 'All of them.' Reporter: ' Anthropic, SpaceX-' President Trump: 'All the big ones yeah, all of them. They're all coming to the White House. Probably next week.' Reporter: 'Is the idea that there would be dividends for the American-' President Trump: 'I don't know. We're going to see. I mean, we're going to see. Sort of, it's like, you make them a partnership in this revolution. It would be a beautiful thing. It would make them rich.'
1
4
21
7,128
Arvind Narayanan retweeted
Massive output uptick due to agentic AI. Complete flat adoption.
456
963
7,258
2,214,988
Arvind Narayanan retweeted
Yes, although it's not just the pace of change. The *nature* of change is very different. The main things bottlenecking human change are rarely an absence of raw intelligence or knowledge. We already have the intelligence and knowledge necessary for bringing about unambiguous human progress in many domains, but still can't do so because of countless all-too-human factors: incentives, conflicting interests, collective action problems, human irrationality, etc. For example, it's relatively easy to understand why the UK is so messed up - Claude can already give you an accurate, accessible explanation - and yet we lack the collective ability to translate this intelligence into action. It's also noteworthy that, because many human factors often push against desirable change and progress, unleashing "super-intelligence" into the human world could even slow some things down or lead to various kinds of deterioration (even setting aside dramatic examples of "AI takeover", "AI-enhanced coups", etc.). E.g., Science as an institution is completely messed up not primarily due to a lack of raw intelligence among scientists but due to bad incentives that reward intellectual activity that doesn't do much to advance knowledge. Adding more and faster raw intelligence to this system might just exacerbate the problem, producing more studies, paper submissions, grant applications, etc., without producing more insights, and in fact making it even harder to find signal amidst the avalanche of noise. Sometimes people respond, "Ok, but we'll just use superintelligence to figure out how to get around all these bottlenecks," or "But by definition, superintelligence wouldn't be subject to these bottlenecks", but the first response simply doesn't grasp the issue and the second is basically using the term "superintelligence" to mean something like "magic". Obviously these kinds of points have been made many times before (e.g. by @random_walker) but they're still under-rated IMO.
One thing that discussions of AI don't seem to have internalized yet is that AI lives on a different time-scale than human beings. Sci-fi predicted this, but we never talk about it now that AI is real. But it has big implications.
5
8
70
10,809
Arvind Narayanan retweeted
Very excited to share that our paper "Towards a Science of AI Agent Reliability" was accepted at ICML 2026! See you in Seoul! 🎉 We just released our camera ready version with three important updates (details below). We also recorded a short video on the paper's contributions. Main changes (full discussion at hal.cs.princeton.edu/reliabi…): 1️⃣We have added the latest set of frontier models to our evaluation (GPT 5.5, Gemini 3.1 Pro and 3.5 Flash, and Claude Opus 4.7) and find that they are not meaningfully more reliable than previously released models. Agent reliability is still far from being solved. 2️⃣We have updated the definition and measurement of our outcome consistency metric, which contained a typo in the pre-print we initially released. This caused us to under-estimate outcome consistency in our initial set of results. We have updated the paper and our codebase to the corrected metric. Despite this change, our new results show that outcome consistency is still surprisingly low across many reported models. 3️⃣We discovered multiple issues in our HAL Generalist Agent scaffold that we used for our experiments on GAIA. Notably, we discovered multiple instances of answer leakage and agents cheating on our evaluation. This caused us to slightly over-estimate both accuracy and reliability. At the same time, we noticed that the scaffold was overly constrained in terms of permissible software library imports. This caused us to slightly under-estimate both accuracy and reliability. We have done a rigorous audit of the scaffold and have fixed those issues. Overall, we saw that our resulting accuracy and reliability numbers are not meaningfully impacted by this change when compared to our original numbers. 📄Our paper: arxiv.org/abs/2602.16666 📊Our dashboard: hal.cs.princeton.edu/reliabi… 🎥Short video: youtu.be/qftDfEft7U0 Joint work w/ @sayashk, @PKirgis, @khl53182440, @SaitejaUtpala, and @random_walker.
13
36
248
24,011
An unexpected benefit of coding agents has been the ease of customizing my desktop for productivity in extremely niche ways. For example, I need my main browser window to jut out of the screen by 10 pixels (for obscure reasons no one will care about). The problem is that macOS hates this and "helpfully" brings it back into the screen at every opportunity. But with a few minutes with an agent, I've been able to use a desktop automation tool called Hammerspoon to create a Lua script that auto-detects this shift and moves it right back, fast enough that I don't even notice. It's not always smooth sailing — the score so far is something like Claude Code: 8; MacOS: 3. But the wins so far have collectively made a big difference to my sanity! And before coding agents it never made sense to even try, because it would take hours to get something like this to work and the chances of success were far from guaranteed.
9
35
7,078
Arvind Narayanan retweeted

15
25
206
131,665
Arvind Narayanan retweeted
It's easy to show that an AI agent will scheme if you nudge it to. It's harder to tell if it would scheme naturally. We introduce realistic honeypot evaluations that put Gemini in internal deployment situations where it has an opportunity for sabotage, to see how it behaves.
1
16
79
18,239