Technology Leader at WTW. Love technology, photography, and music creation. Views are mine. blog.subhadipc.com

Joined May 2010
12 Photos and videos
Velocity is an indicator of movement. Adoption is an indicator of impact. Teams will frequently track the first but rarely the second, as adoption is more difficult to "game". In my experience, that is where the real productivity story sits. The one nobody ever puts into a board deck. What did your team ship in the last quarter that people actually use?
5
Solid list. The one that usually gets cut is human handoff design. What happens when the Agent reaches its uncertainty ceiling? In most of the production failures I am thinking about there weren't capability issues. There were handoff failures where the Agent continued trying long after it should have escalated to a human reviewer who could have made a clean call.
Jensen Huang, CEO of Nvidia: "Every engineer is going to have and manage hundreds of agents." But most people have not even configured one that survives past day 9 You can't manage 100 agents if your first one forgets everything after every session No CS program teaches this Not harness engineering Not agent memory architecture Not systems that survive production Not workflows that don't burn your API budget doing nothing Most people building agents right now are building demos They break on day 9 They forget every session They cost money and deliver nothing One builder mapped the whole thing out Free Step by step No gatekeeping This is the complete guide to building AI agents that actually work in 2026 Bookmark it for the weekend
17
David Holz's 'agent-friendly codebase' post was wild. Everyone in the replies is re-inventing modularization in real-time, just with new labels. In my experience this is Conway's Law in reverse. If your team has changed from 5 humans working together to 50 agents all working in parallel then your module boundaries will need to be based on that new way of communicating, not how they were drawn at a time when there were only humans. And this isn't really Git's fault. How do you draw module boundaries when agents are on the org chart?
9
Following the "read all lines of AI code" debate, and it's my curiosity about what is happening versus the fear of what will happen that has piqued my interest at present. Both Yacine and Nick Dobos are correct based upon the position of where within the codebase you sit. However at Enterprise scale no single individual reviews all lines of code. Yet someone does own each line. In my opinion, however, the better question is which lines you choose to thoroughly review, as well as if I'm being entirely honest with myself the Agent Pull Request impacting a payment flow receives every line reviewed whereas the scaffolding for an internal migration script does not receive such thorough review. Therefore, we have review depth directly related to blast radius, and NOT head count. And the less-than-glamorous upstream solution (if there is one) is also true. The "I cannot read this code" issue can simply be downstream from "we left ambiguity in our spec to begin with" and "we did not use a static source code analysis / code formatter tool in our AI workflow". What is your blast radius rule for Agent Pull Requests?
29
This is true. Another issue I see sometimes is that investors try to pull signals from each individual no on a business plan. But the signal actually lives in the pattern across many. One no is a fortune cookie. But analyzing twenty numbers together coherently might be worth that "hot deal" they are looking for. They should dive deeper to identify if they like the idea as a whole.
The same investor will react differently to the same idea based on how hot a deal you are and how much they like your personality. If you're not a hot deal and they don't like you, they'll look for reasons to hate the idea.
18
Same. The interesting wrinkle is that I trust LLM answers in conversation, where I am steering. I distrust LLM prose in publication, where the writer is supposed to be the one who already did the thinking. Different contract with the reader.
Bonus blog post for today about my instinctual revulsion towards AI slop: surfingcomplexity.blog/2026/…
2
58
The authors of a new NBER working paper used data from over 100,000 GitHub developers. They found that agentic (autonomous) AI tool usage increased commits by 140%, whereas production-quality code and actual app engagement increased very little. Decades passed before factory automation was able to actually increase overall productivity. It wasn't the machines themselves which were the bottleneck but rather the workflows surrounding them. What workflows or processes will be redesigned for use with these agent-based technologies, as opposed to simply adding them on top of existing ones?
25
The interesting part of this chart is not the taxonomy. It is that the choice between Subagent, Skill, Agent Team, or Workflow is increasingly made by the model at runtime. Not by an architect at design time. We just outsourced an architecture decision to inference itself. The governance layer has not caught up yet. x.com/emollick/status/206307…

This chart from Anthropic is useful, since Agent Teams and Workflows are both very new and very powerful (and token hungry). On the other hand, maybe it doesn't matter as a lot of the decisions about which approach to use is from the AI itself & it often uses them in combination
16
The most invisible layer of AI is data labeling, and it is also where false confidence comes from. A model can have strong evaluation numbers and still be wrong in the places that matter most. The bug only surfaces at the edge, when nobody can explain why the model failed. Outsourcing the labeling does not outsource the consequence. It just delays who has to deal with it. Who actually owns the labeled row when the model gets it wrong?
7
In my experience, the true cost of an engineering team supporting a third (and fourth) set of AI coding instructional files (i.e., CLAUDE.md, AGENTS.md, etc.) is not the additional file itself, but rather the additional work in prompt engineering due to the duplicated effort. I believe we have all experienced the "format war" pattern. Browsers (IE vs Netscape). Containers (Docker vs Kubernetes). The slow drift around package managers (yum vs apt-get vs pip vs npm). The format which can read the others' formats silently is generally going to be the one that wins. That has been the general trend as far as I know. What is your team converging on?
1
23
Interesting tweet. This can potentially change the monitoring question. If Agents are patient and persistent problem solvers as the post says, the right question is not whether one output was wrong but whether the pattern of outputs across many sessions says the Agent learned the wrong thing. Point-in-time evals miss that.
Agents are patient, persistent, problem solvers. This property is much more important to understanding their behavior than thinking about hallucinations or other model limitations. And its becoming more and more important over time.
28
Today we are talking about how a developer wrote 3 months worth of AI generated code in 2 weeks. And the rewrite was half the size of the original. However, he had no idea why he created the code that way. I believe what is being talked about today has very little to do with using AI for coding, and much more to do with the difference between writing clean code that works in production and explaining the code at 2 AM during an outage while trying to remember which prompt you used. The METR 2025 study determined that experienced developers worked 19% slower when using AI to generate code. Their perception of working faster was an illusion. How much of your current code base is quick to get into production but difficult to explain?
9
The unspoken half of this. Your 'default place' is also your default calendar. And if meetings have already chunked the day into 30 minute slivers, no coffee shop fixes that. The architecture of focus starts at the calendar, then the environment.
Go to a different place to do the work - coffee shop, coworking space, park - anything but the default place you work from.
24
Amazon canceled their internal AI usage 'leaderboard' last week. Employees quickly discovered that the obvious way to increase their AI usage numbers was to create agents that did nothing but perform useless functions. If an employee's 'use of AI' increases by virtue of a higher number on some dashboard, then employees will find ways to keep that number climbing. On the other hand, if the employee is incentivized by metrics for 'work shipped,' there is no guarantee any actual work ships. My experience has shown that if you design a dashboard with certain metrics in mind, you are likely to be rewarded with those same behaviors. What metric have you regretted shipping?
10
Competitive intel used to be a deck someone refreshed once a quarter. Now an agent reads your rivals' changelogs and job postings every morning, then flags what actually moved. Intel is a pipeline now. Not a meeting. Is anyone still running theirs on a calendar instead of a feed?
11
Mitchell Hashimoto (creator of Vagrant, Terraform, now Ghostty) published an experiment on May 28 showing an AI 'Ralph loop' agent optimized a Go renderer from 88ms to 1.5ms in four hours and $350. His manual rewrite hit 20 microseconds with zero allocations. So the debate is now whether agents are bad. I think that's a wrong debate. One such experiment does not prove anything. Expert human engineers understand the whole system better and can optimize better - no doubt there. But could the agent have been prompted better to not optimize for local optima? What do you think?
1
42
I think the most fascinating aspect of this "economics flip" is this: as soon as you can perform simple calls much faster than typing them, the only question becomes "who pays for each command". When an employer funds the tool, users develop the habit unconsciously. If users fund their own personal use, then each time they make the same call, it is priced differently. At least that is the split I am watching.
Gemini 3.5 Flash changed how I interact with command line tools. It’s so fast that I find it faster to use it rather than typing even the simple commands.
1
52
The Agent shift is occurring for real. See this: x.com/simonlast/status/20579… 6 months ago, you had an Agent run in a sprint. It was called "autonomy". Today I see posts regarding an individual Agent running without human interaction for weeks. When software begins performing tasks on your behalf, the product is no longer the tool but rather trust. The question: can this be implemented in financial, health, and other regulated spaces without tripping controls?

1/ Some things I've learned recently running coding agents on large-scale projects. Most of this contradicts advice from 6 months ago!
1
1
40
There are many unseen costs to both sides of the fork-your-deps argument that most teams tend to downplay. In addition to owning the codebase once you've forked it, you now also have the responsibility for patching cycles. CVE tracking too. And having an answer ready for the audit conversation when SOC 2 or PCI comes asking why your fork is 3 months behind a specific security patch, with no upstream maintainer you can point to. I believe the real lesson from @mitchellh's HashiCorp discipline is captured in the line, 'show me the commit we need.' Most updates will fail that test. So too will most forks. The harder question is generally whether this should even be a dependency you don't control in the first place. Fork, freeze, or rebuild is rarely where the real decision lives. So which one does your team actually default to?
36