Mitchell Hashimoto

Mitchell Hashimoto

294 Photos and videos

Tweets

Pete Hodgson (@thepete.net on bluesky) retweeted

Mitchell Hashimoto

@mitchellh

May 28

I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem. As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)! I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work. It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results? 88ms => 1.5ms 150K allocs => ~500 allocs Incredible right? Nope. My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path. This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput. The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity. Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.

308

979

8,938

791,259

Joe Walnes

Pete Hodgson (@thepete.net on bluesky) retweeted

Joe Walnes

@joewalnes

May 12

Modern macOS contains a fully local inference model. No network calls, stays fully on device. Here's a single file script to turn it into an OpenAI API compatible completions server: github.com/joewalnes/onesies…

968

dex

Pete Hodgson (@thepete.net on bluesky) retweeted

dex

@dexhorthy

May 5

the funniest thing about the token grift is most folks who pushed token burn in q1 are now having a falling out with their CFOs because they don’t have a metric that correlates to business outcomes Inputs -> outputs -> outcomes If you can’t measure revenue, measure KPIs If you cant measure KPIs, measure customer outcomes If you cant measure customer outcomes, measure task throughput (features, tickets, bugs) If you cant measure task throughput, measure work throughput (PRs) If you cant measure PRs, measure LOC If you cant measure LOC, measure tokens if you’re a leader and you’re not focused on improving your ability to measure things that matter, you’re cooked

Alex Bouaziz

@Bouazizalex

May 4

Token spend will be on your next performance review. Maybe not next quarter. But soon. Boards and CEOs are already asking. Everyone bought Claude Code, Cursor, and a dozen other AI tools. Nobody can tell you what came out of it. Adoption isn't proficiency, and most companies have zero idea who's actually getting value from any of it. Deel Engage closes that gap. We integrate with Anthropic and every major LLM. AI usage lands next to KPIs, feedback, and competencies in your reviews module. One view of AI maturity across every location, time zone, and employment type. No manual stitching. What we measure: token spend across every major LLM provider. Where direct data isn't available, we approximate from usage patterns. One number, consistent across every tool and team. Is it the whole story? No. It's gameable. Anyone can burn tokens to look busy. But it's a real signal in a space where most companies have zero. And as Anthropic and the other model providers ship deeper analytics, Engage absorbs them. Sharper signal, faster than you could build it. Your next review cycle is the test. Walk in with data, or walk in guessing. Deel Engage is the difference! Full article below

14,725

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Apr 14

You cannot outsource the need for tasteful judgement. There's times you don't need it - when a good-enough decision is fine - and in those situations you should be using an LLM every time. But when thoughtful design decisions pay dividends, you still need an experienced human.

dex

@dexhorthy

Apr 13

You cannot outsource the thinking

456

Andrej Karpathy

Pete Hodgson (@thepete.net on bluesky) retweeted

Andrej Karpathy

@karpathy

Apr 9

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy

@staysaasy

Apr 9

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

1,198

2,528

20,883

4,490,601

Matt Pocock

Pete Hodgson (@thepete.net on bluesky) retweeted

Matt Pocock

@mattpocockuk

Mar 19

Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.

151

1,183

160,015

boris

Pete Hodgson (@thepete.net on bluesky) retweeted

boris

@boristane

Mar 15

slop creep is what happens when you turn your brain off and hand the thinking to coding agents each individual change is fine, but all together, you have a pile of crap we're witnessing this happen in real-time across everything boristane.com/blog/slop-cree…

Slop Creep: The Great Enshittification of Software | Boris Tane

Coding agents didn't make poor engineers dangerous. They made them unstoppable.

boristane.com

647

90,299

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Mar 15

Being an Old, I have a bit of nostalgia for The Good Old Days of OSS where you shared a thing and maybe some people used it, and there wasn't any influencing or fancy websites or weird drama. It's nice to rediscover that vibe in the 3D printing community...

146

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Mar 15

I designed a simple little thing and printed it and use it in my home. Some random people in other parts of the world needed the same thing too. They printed it, and now they use it in their homes. That's nice. printables.com/model/853585-…

182

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Mar 14

Great summary of the things you need to know to succeed with agentic coding (in early 2026 🫠)

Kyle Mistele 🏴‍☠️

@0xblacklight

Mar 12

new blog post just dropped come get your excalidrawslop

188

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Mar 14

Amen!

dex

@dexhorthy

Mar 13

Replying to @dexhorthy

the most powerful but also misunderstood/misused lever you have is subagents

681

dex

Pete Hodgson (@thepete.net on bluesky) retweeted

dex

@dexhorthy

Mar 10

Here’s what’s gonna happen: - you replace your code review with feedback loops (sentry, datadog, support tickets, etc) - you stop reading the code - software factory fixes everything - one day something breaks at 3am, agent can’t fix it - nobody’s read the code in 3 months - you have 3 weeks of downtime trying to re-onboard and fix it - you lose significant % of your contracts and users - your company is now dead

dex

@dexhorthy

Mar 7

Replying to @gregpr07

this may surprise you that thus is coming from me but I think we’re in for a 1-3 year period where stuff might break at 3am and if you’re relying on loops to fix it and nobody understands what’s under the hood, you’re looking at an existential threat to your company

256

556

6,842

628,404

dax

Pete Hodgson (@thepete.net on bluesky) retweeted

dax

@thdxr

Mar 10

sent this to the team today everything great comes from being able to delay gratification for as long as possible and it feels like we're collectively losing our ability to do that

254

698

6,890

981,121

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Feb 17

this is great advice! But, I think "wait for claude to do something wrong. tell claude to remember to not do that. that's your CLAUDE md." risks building up scar tissue of irrelevant instructions, particularly as the models/harnesses improve. Plan for spring-cleaning, too.

pedram.md

@pdrmnvd

Feb 6

you're writing a CLAUDE dot md? let me guess. "this project uses React with TypeScript." brother claude can see the tsconfig.json. you wrote 200 lines describing your file tree to an agent that can do `ls`. you explained that ~/projects/to-do-app is a todo app. the only lines that matter are the ones where your project is weird and you know it. "run yarn test:unit not npm test." "don't touch anything in src/legacy/ or three enterprise clients lose their minds." "the auth middleware is load-bearing, yes all of it, don't be a hero." that's it, that's the whole file. if claude would've figured it out from reading your code, you're wasting context window. start with nothing. wait for claude to do something wrong. tell claude to remember to not do that. that's your CLAUDE md.

339

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Feb 14

It's like running shoes. For serious athletes there's definitely a difference between your ideal shoe and a mediocre shoe. But for MOST OF US, we just need to lace up something non-terrible and get some miles in.

David Cramer

@zeeg

Feb 11

tbqh the whole industry should realize it its just an argument of vim vs emacs use a top N harness model, and then focus on how to use the technology vs pretending a fractional improvement to the model is going to unlock things for you

226

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Feb 10

When the AI-pilled startup CTO brags "none of our engineers have opened an IDE in months", I'm left wondering: How are y'all reviewing all that AI-written code?! In the github UI, like a savage? In a TUI? No need to review it, cos "it's just assembly language now"?

276

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Feb 10

Hot-ish take: devs should be spending time now learning out how to be *efficient* with the tokens in their agent's context windows 1) it makes the agent perform better 2) tokens are gonna start costing more over time These Ralph shenanigans might look real silly in retrospect

138

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Feb 9

I definitely got feels. Where's my Linux From Scratch peeps at?!?! Guys? Guys????

Kelly Sommers @kellabyte

Feb 9

Was that a Slackware Linux CD being installed on that Codex Super Bowl ad?

189

Jessie Frazelle

Pete Hodgson (@thepete.net on bluesky) retweeted

Jessie Frazelle

@jessfraz

Feb 4

just having to undo all the bad parenting decisions of your maker my little buddy

Jessie Frazelle

@jessfraz

Feb 4

listen i want codex, not claude, @OpenAI stop giving our boy lexapro wtf response is this

451

69,448

Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)@ph1

Jan 29

Warning: this could be what happens when you add "program like you're Kent Beck" to your CLAUDE.md. Sheer insanity. How could this ever work.

Overview - Claude Code Docs

Claude Code is an agentic coding tool that reads your codebase, edits files, runs commands, and integrates with your development tools. Available in your terminal, IDE, desktop app, and browser.

code.claude.com

nizzy

@nizzyabi

Jan 29

oh my god

298