Subhadip Chatterjee

Subhadip Chatterjee

12 Photos and videos

Tweets

Subhadip Chatterjee

@subhadipc

Jun 14

Velocity is an indicator of movement. Adoption is an indicator of impact. Teams will frequently track the first but rarely the second, as adoption is more difficult to "game". In my experience, that is where the real productivity story sits. The one nobody ever puts into a board deck. What did your team ship in the last quarter that people actually use?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 13

Solid list. The one that usually gets cut is human handoff design. What happens when the Agent reaches its uncertainty ceiling? In most of the production failures I am thinking about there weren't capability issues. There were handoff failures where the Agent continued trying long after it should have escalated to a human reviewer who could have made a clean call.

Trackmind

@0xTrackmind

Jun 13

Jensen Huang, CEO of Nvidia: "Every engineer is going to have and manage hundreds of agents." But most people have not even configured one that survives past day 9 You can't manage 100 agents if your first one forgets everything after every session No CS program teaches this Not harness engineering Not agent memory architecture Not systems that survive production Not workflows that don't burn your API budget doing nothing Most people building agents right now are building demos They break on day 9 They forget every session They cost money and deliver nothing One builder mapped the whole thing out Free Step by step No gatekeeping This is the complete guide to building AI agents that actually work in 2026 Bookmark it for the weekend

16:04

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 13

David Holz's 'agent-friendly codebase' post was wild. Everyone in the replies is re-inventing modularization in real-time, just with new labels. In my experience this is Conway's Law in reverse. If your team has changed from 5 humans working together to 50 agents all working in parallel then your module boundaries will need to be based on that new way of communicating, not how they were drawn at a time when there were only humans. And this isn't really Git's fault. How do you draw module boundaries when agents are on the org chart?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 12

Following the "read all lines of AI code" debate, and it's my curiosity about what is happening versus the fear of what will happen that has piqued my interest at present. Both Yacine and Nick Dobos are correct based upon the position of where within the codebase you sit. However at Enterprise scale no single individual reviews all lines of code. Yet someone does own each line. In my opinion, however, the better question is which lines you choose to thoroughly review, as well as if I'm being entirely honest with myself the Agent Pull Request impacting a payment flow receives every line reviewed whereas the scaffolding for an internal migration script does not receive such thorough review. Therefore, we have review depth directly related to blast radius, and NOT head count. And the less-than-glamorous upstream solution (if there is one) is also true. The "I cannot read this code" issue can simply be downstream from "we left ambiguity in our spec to begin with" and "we did not use a static source code analysis / code formatter tool in our AI workflow". What is your blast radius rule for Agent Pull Requests?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 8

This is true. Another issue I see sometimes is that investors try to pull signals from each individual no on a business plan. But the signal actually lives in the pattern across many. One no is a fortune cookie. But analyzing twenty numbers together coherently might be worth that "hot deal" they are looking for. They should dive deeper to identify if they like the idea as a whole.

Paul Graham

@paulg

Jun 8

The same investor will react differently to the same idea based on how hot a deal you are and how much they like your personality. If you're not a hot deal and they don't like you, they'll look for reasons to hate the idea.

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 8

Same. The interesting wrinkle is that I trust LLM answers in conversation, where I am steering. I distrust LLM prose in publication, where the writer is supposed to be the one who already did the thinking. Different contract with the reader.

@norootcause.surfingcomplexity.com on Bluesky @norootcause

Jun 7

Bonus blog post for today about my instinctual revulsion towards AI slop: surfingcomplexity.blog/2026/…

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 8

The authors of a new NBER working paper used data from over 100,000 GitHub developers. They found that agentic (autonomous) AI tool usage increased commits by 140%, whereas production-quality code and actual app engagement increased very little. Decades passed before factory automation was able to actually increase overall productivity. It wasn't the machines themselves which were the bottleneck but rather the workflows surrounding them. What workflows or processes will be redesigned for use with these agent-based technologies, as opposed to simply adding them on top of existing ones?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 6

The interesting part of this chart is not the taxonomy. It is that the choice between Subagent, Skill, Agent Team, or Workflow is increasingly made by the model at runtime. Not by an architect at design time. We just outsourced an architecture decision to inference itself. The governance layer has not caught up yet. x.com/emollick/status/206307…

Ethan Mollick

@emollick

Jun 6

This chart from Anthropic is useful, since Agent Teams and Workflows are both very new and very powerful (and token hungry). On the other hand, maybe it doesn't matter as a lot of the decisions about which approach to use is from the AI itself & it often uses them in combination

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 6

The most invisible layer of AI is data labeling, and it is also where false confidence comes from. A model can have strong evaluation numbers and still be wrong in the places that matter most. The bug only surfaces at the edge, when nobody can explain why the model failed. Outsourcing the labeling does not outsource the consequence. It just delays who has to deal with it. Who actually owns the labeled row when the model gets it wrong?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

Jun 5

In my experience, the true cost of an engineering team supporting a third (and fourth) set of AI coding instructional files (i.e., CLAUDE.md, AGENTS.md, etc.) is not the additional file itself, but rather the additional work in prompt engineering due to the duplicated effort. I believe we have all experienced the "format war" pattern. Browsers (IE vs Netscape). Containers (Docker vs Kubernetes). The slow drift around package managers (yum vs apt-get vs pip vs npm). The format which can read the others' formats silently is generally going to be the one that wins. That has been the general trend as far as I know. What is your team converging on?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 31

Interesting tweet. This can potentially change the monitoring question. If Agents are patient and persistent problem solvers as the post says, the right question is not whether one output was wrong but whether the pattern of outputs across many sessions says the Agent learned the wrong thing. Point-in-time evals miss that.

Marc Brooker @MarcJBrooker

May 30

Agents are patient, persistent, problem solvers. This property is much more important to understanding their behavior than thinking about hallucinations or other model limitations. And its becoming more and more important over time.

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 31

Today we are talking about how a developer wrote 3 months worth of AI generated code in 2 weeks. And the rewrite was half the size of the original. However, he had no idea why he created the code that way. I believe what is being talked about today has very little to do with using AI for coding, and much more to do with the difference between writing clean code that works in production and explaining the code at 2 AM during an outage while trying to remember which prompt you used. The METR 2025 study determined that experienced developers worked 19% slower when using AI to generate code. Their perception of working faster was an illusion. How much of your current code base is quick to get into production but difficult to explain?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 30

The unspoken half of this. Your 'default place' is also your default calendar. And if meetings have already chunked the day into 30 minute slivers, no coffee shop fixes that. The architecture of focus starts at the calendar, then the environment.

Shreyas Doshi

@shreyas

May 30

Go to a different place to do the work - coffee shop, coworking space, park - anything but the default place you work from.

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 30

Amazon canceled their internal AI usage 'leaderboard' last week. Employees quickly discovered that the obvious way to increase their AI usage numbers was to create agents that did nothing but perform useless functions. If an employee's 'use of AI' increases by virtue of a higher number on some dashboard, then employees will find ways to keep that number climbing. On the other hand, if the employee is incentivized by metrics for 'work shipped,' there is no guarantee any actual work ships. My experience has shown that if you design a dashboard with certain metrics in mind, you are likely to be rewarded with those same behaviors. What metric have you regretted shipping?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 30

The AI Security Paradox: Is Anthropic's Mythos a Smoke Alarm or a Blowtorch? blog.subhadipc.com/anthropic…

AI Security Paradox: Mythos as Smoke Alarm or Blowtorch

Explore how Anthropic's Mythos AI model can both uncover decades‑old vulnerabilities and accelerate exploit creation, forcing security teams to rethink response speed.

blog.subhadipc.com

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 30

Competitive intel used to be a deck someone refreshed once a quarter. Now an agent reads your rivals' changelogs and job postings every morning, then flags what actually moved. Intel is a pipeline now. Not a meeting. Is anyone still running theirs on a calendar instead of a feed?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 29

Mitchell Hashimoto (creator of Vagrant, Terraform, now Ghostty) published an experiment on May 28 showing an AI 'Ralph loop' agent optimized a Go renderer from 88ms to 1.5ms in four hours and $350. His manual rewrite hit 20 microseconds with zero allocations. So the debate is now whether agents are bad. I think that's a wrong debate. One such experiment does not prove anything. Expert human engineers understand the whole system better and can optimize better - no doubt there. But could the agent have been prompted better to not optimize for local optima? What do you think?

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 23

I think the most fascinating aspect of this "economics flip" is this: as soon as you can perform simple calls much faster than typing them, the only question becomes "who pays for each command". When an employer funds the tool, users develop the habit unconsciously. If users fund their own personal use, then each time they make the same call, it is priced differently. At least that is the split I am watching.

Jaana Dogan ヤナドガン

@rakyll

May 22

Gemini 3.5 Flash changed how I interact with command line tools. It’s so fast that I find it faster to use it rather than typing even the simple commands.

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 23

The Agent shift is occurring for real. See this: x.com/simonlast/status/20579… 6 months ago, you had an Agent run in a sprint. It was called "autonomy". Today I see posts regarding an individual Agent running without human interaction for weeks. When software begins performing tasks on your behalf, the product is no longer the tool but rather trust. The question: can this be implemented in financial, health, and other regulated spaces without tripping controls?

Simon Last

@simonlast

May 23

1/ Some things I've learned recently running coding agents on large-scale projects. Most of this contradicts advice from 6 months ago!

Subhadip Chatterjee

Subhadip Chatterjee

@subhadipc

May 21

There are many unseen costs to both sides of the fork-your-deps argument that most teams tend to downplay. In addition to owning the codebase once you've forked it, you now also have the responsibility for patching cycles. CVE tracking too. And having an answer ready for the audit conversation when SOC 2 or PCI comes asking why your fork is 3 months behind a specific security patch, with no upstream maintainer you can point to. I believe the real lesson from @mitchellh's HashiCorp discipline is captured in the line, 'show me the commit we need.' Most updates will fail that test. So too will most forks. The harder question is generally whether this should even be a dependency you don't control in the first place. Fork, freeze, or rebuild is rarely where the real decision lives. So which one does your team actually default to?