Testing, yet again, frontier models to execute meaningful analysis of rather complex codebases. Something that was bugging me. Thought "maybe I'm missing something here" and had a go. But, but...
It took at stance. I said "can't be, it does not make sense." Doubled down: "you are wrong, I am right." Put pen to paper: "this can't be, but let's try, maybe I'm missing something." Tested empirically: the proposed solution was wrong, as I predicted. The answer I got was "you are wrong, if it worked before it was by accident, and my fix stands" type of response.
Then I blew the gasket, on purpose. The answer coming back is telling.
But why? Frontier AI tend to get lost (fail to see the wider coding context) beyond, say, 200-300 lines of code at either side of whatever you're looking at. But also, and more importantly, you need swearing at them with full strength (see screen cap) to avoid post-rationalization and bullshitting. That's even more dangerous.
If you know what you're doing and why it gives you a foot to stand, but if you do not (as most vibe-coders, definitionally) you swallow whatever bullshit is thrown at you.
This is why vibe-coding and "the death of software engineers" is just gaslighting. That's why vibe coding stuff will invariably break things. The more vibe-coded, the more complex it will be to backtrack. That's also why it'll likely become a compounding problem.
Granted, frontier AI can help you with certain things, but you'll end into a deep and dark rabbit hole if you let it loose with even mildly complex stuff.
Receipts provided. Things to note:
- I did bully Claude to shake it out of a post-rationalization loop. That's always nearly necessary, but it does not guarantee you'll land in a better place.
- Bullying has side-effects: Frontier AI models tend to capitulate easily to avoid alienating the user (you) even when the user is wrong (without knowing).
- Consequently, if you really don't know what you're doing, you end in between a rock and a hard place. You need to swear at them to break their COT out of self-confirmatory loops, but if you don't know your stuff shouting will lead to appeasing (the seemingly default response when swearing at them; this is not uniformly the case though) which may end up potentially confirming your own biases.
- That's why, once again, vibe-coding stuff is potentially lethal in production environments.
@GaryMarcus @rohanpaul_ai @Scobleizer @rauchg @wsosaescudero