Joined March 2009
294 Photos and videos
Pete Hodgson (@thepete.net on bluesky) retweeted
I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem. As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)! I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work. It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results? 88ms => 1.5ms 150K allocs => ~500 allocs Incredible right? Nope. My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path. This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput. The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity. Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.
308
979
8,938
791,259
Pete Hodgson (@thepete.net on bluesky) retweeted
Modern macOS contains a fully local inference model. No network calls, stays fully on device. Here's a single file script to turn it into an OpenAI API compatible completions server: github.com/joewalnes/onesies…
3
1
8
968
Pete Hodgson (@thepete.net on bluesky) retweeted
May 5
the funniest thing about the token grift is most folks who pushed token burn in q1 are now having a falling out with their CFOs because they don’t have a metric that correlates to business outcomes Inputs -> outputs -> outcomes If you can’t measure revenue, measure KPIs If you cant measure KPIs, measure customer outcomes If you cant measure customer outcomes, measure task throughput (features, tickets, bugs) If you cant measure task throughput, measure work throughput (PRs) If you cant measure PRs, measure LOC If you cant measure LOC, measure tokens if you’re a leader and you’re not focused on improving your ability to measure things that matter, you’re cooked
Token spend will be on your next performance review. Maybe not next quarter. But soon. Boards and CEOs are already asking. Everyone bought Claude Code, Cursor, and a dozen other AI tools. Nobody can tell you what came out of it. Adoption isn't proficiency, and most companies have zero idea who's actually getting value from any of it. Deel Engage closes that gap. We integrate with Anthropic and every major LLM. AI usage lands next to KPIs, feedback, and competencies in your reviews module. One view of AI maturity across every location, time zone, and employment type. No manual stitching. What we measure: token spend across every major LLM provider. Where direct data isn't available, we approximate from usage patterns. One number, consistent across every tool and team. Is it the whole story? No. It's gameable. Anyone can burn tokens to look busy. But it's a real signal in a space where most companies have zero. And as Anthropic and the other model providers ship deeper analytics, Engage absorbs them. Sharper signal, faster than you could build it. Your next review cycle is the test. Walk in with data, or walk in guessing. Deel Engage is the difference! Full article below
8
8
95
14,725
You cannot outsource the need for tasteful judgement. There's times you don't need it - when a good-enough decision is fine - and in those situations you should be using an LLM every time. But when thoughtful design decisions pay dividends, you still need an experienced human.
Apr 13
You cannot outsource the thinking
2
456
Pete Hodgson (@thepete.net on bluesky) retweeted
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.
1,198
2,528
20,883
4,490,601
Pete Hodgson (@thepete.net on bluesky) retweeted
Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.
151
60
1,183
160,015
Pete Hodgson (@thepete.net on bluesky) retweeted
Mar 15
slop creep is what happens when you turn your brain off and hand the thinking to coding agents each individual change is fine, but all together, you have a pile of crap we're witnessing this happen in real-time across everything boristane.com/blog/slop-cree…
39
62
647
90,299
Being an Old, I have a bit of nostalgia for The Good Old Days of OSS where you shared a thing and maybe some people used it, and there wasn't any influencing or fancy websites or weird drama. It's nice to rediscover that vibe in the 3D printing community...
1
2
146
I designed a simple little thing and printed it and use it in my home. Some random people in other parts of the world needed the same thing too. They printed it, and now they use it in their homes. That's nice. printables.com/model/853585-…
182
Great summary of the things you need to know to succeed with agentic coding (in early 2026 🫠)
new blog post just dropped come get your excalidrawslop
1
188
Amen!
Mar 13
Replying to @dexhorthy
the most powerful but also misunderstood/misused lever you have is subagents
1
4
681
Pete Hodgson (@thepete.net on bluesky) retweeted
Mar 10
Here’s what’s gonna happen: - you replace your code review with feedback loops (sentry, datadog, support tickets, etc) - you stop reading the code - software factory fixes everything - one day something breaks at 3am, agent can’t fix it - nobody’s read the code in 3 months - you have 3 weeks of downtime trying to re-onboard and fix it - you lose significant % of your contracts and users - your company is now dead
Mar 7
Replying to @gregpr07
this may surprise you that thus is coming from me but I think we’re in for a 1-3 year period where stuff might break at 3am and if you’re relying on loops to fix it and nobody understands what’s under the hood, you’re looking at an existential threat to your company
256
556
6,842
628,404
Pete Hodgson (@thepete.net on bluesky) retweeted
Mar 10
sent this to the team today everything great comes from being able to delay gratification for as long as possible and it feels like we're collectively losing our ability to do that
254
698
6,890
981,121
this is great advice! But, I think "wait for claude to do something wrong. tell claude to remember to not do that. that's your CLAUDE md." risks building up scar tissue of irrelevant instructions, particularly as the models/harnesses improve. Plan for spring-cleaning, too.
you're writing a CLAUDE dot md? let me guess. "this project uses React with TypeScript." brother claude can see the tsconfig.json. you wrote 200 lines describing your file tree to an agent that can do `ls`. you explained that ~/projects/to-do-app is a todo app. the only lines that matter are the ones where your project is weird and you know it. "run yarn test:unit not npm test." "don't touch anything in src/legacy/ or three enterprise clients lose their minds." "the auth middleware is load-bearing, yes all of it, don't be a hero." that's it, that's the whole file. if claude would've figured it out from reading your code, you're wasting context window. start with nothing. wait for claude to do something wrong. tell claude to remember to not do that. that's your CLAUDE md.
2
339
It's like running shoes. For serious athletes there's definitely a difference between your ideal shoe and a mediocre shoe. But for MOST OF US, we just need to lace up something non-terrible and get some miles in.
tbqh the whole industry should realize it its just an argument of vim vs emacs use a top N harness model, and then focus on how to use the technology vs pretending a fractional improvement to the model is going to unlock things for you
1
226
When the AI-pilled startup CTO brags "none of our engineers have opened an IDE in months", I'm left wondering: How are y'all reviewing all that AI-written code?! In the github UI, like a savage? In a TUI? No need to review it, cos "it's just assembly language now"?
2
276
Hot-ish take: devs should be spending time now learning out how to be *efficient* with the tokens in their agent's context windows 1) it makes the agent perform better 2) tokens are gonna start costing more over time These Ralph shenanigans might look real silly in retrospect
1
138
I definitely got feels. Where's my Linux From Scratch peeps at?!?! Guys? Guys????
Was that a Slackware Linux CD being installed on that Codex Super Bowl ad?
189
Pete Hodgson (@thepete.net on bluesky) retweeted
just having to undo all the bad parenting decisions of your maker my little buddy
listen i want codex, not claude, @OpenAI stop giving our boy lexapro wtf response is this
21
20
451
69,448