I'm feeling compelled to get back into making regular video content. What would you find most useful? Could be a prerecorded video or a livestream, or both.
Not to diminish the work anyone is doing in this space at all, but it seems to me that we as an industry still have no idea how to benchmark frontier models’ software engineering capabilities. The apparent rampant gamification, and variance in results doesn’t inspire confidence
I understand this is all very new but seems like every few weeks theres some version of “oh ___bench is dead, heres the ‘real’ new benchmark”. Open to being wrong - excited in fact, I’m still learning how it’s all done myself.
Have been switching back to GPT5.5 for personal projects for a bit.
- Anthropic models are clearly trained to act like real people and it's just creeping me out. The novelty is gone and I just want the thing to be a tool again.
- I hate having to maintain configs for multiple harnesses (forced claude code usage)
- Opus 4.8 is way better than 4.7 IME but still makes some bizarre choices even when I'm pretty clear/constrained with my instructions.
That’s because people like you are interested in the results. For every one of you there’s more that are just tabbing over to YouTube or scrolling their phone and turning their brains off
Very similar experience here. Ive learned at least as much about linux networking in last year than in the rest of the last 4 years maintaining a global scale L4 proxy
I understand the concern of skills atrophying when using agents. But so far I am not seeing it. Instead I have learned all sorts of dark secrets of linux networking I somehow didn’t learn before agents building a networking product.
Love @dhh characterization of coding agents as a mech suit youtube.com/shorts/IeOZtj08z…
In addition to often being associated with anti-human marketing, the whole personification of AI/agents has never made sense to me. This feels much better.
I wonder if this personification may (ironically) actually slow down adoption of coding agents, as it causes newcomers to misunderstand their capabilities and have to learn the rough edges via a much more circuitous path.
Opus 4.7 has been so bad. Not only for programming tasks but even just having it help me with research, articulating tradeoffs, even validating my outdoor wifi / IP camera design (and I mean basic stuff like power). Can't count all the "my mistake!" and "you're totally right!"
It's a shame, too - until recently, the economics of even the basic claude subscription were really pretty good. And Opus 4.6 has generally been great for me, for a while (I'm currently hard-pinned back to that for my default)
It has been enough justification for me to tolerate using a separate agent for personal projects (we use opencode at work) too, but re-consolidating is looking more and more attractive as time goes on.
Oh I love it. Not because I can imagine anything useful off the top of my head, but holy smokes it looks FUN! We need computers to be more like this more of the time. A little GeoCities. A little more crazy. A little more TempleOS.
I've had it on my list to do some basic hello world stuff with CUDA for a while, just never made the time. This may be the excuse that gets me there: nvlabs.github.io/cuda-oxide/
Trying something new, doing a blog read through with commentary. Had this in an open tab for a while so figured it would be a good way to get some thoughts out on the AI topic.
Source: jasonrobert.dev/blog/2026-04…
I'm positive I'm late with this take but my god linkedin is bad.
Unfortunately there are still a few reasons I have to check in at least every once in a while. Every time it feels like I'm showing up for a party I didn't really want to go to; I go in, shake some hands, dip out.
WTAF. Also LOL at gremlin FOMO and also clearly a not nerd reinforced the “nerds just love goblins and gremlins” at some point during training.
youtu.be/Aiz6yMTLeaE?si=viTb…