What I am about to describe ain’t AGI; it’s a sign of a trillion dollar trainwreck.
If I had told you in 2022 that the 2026 version of GPT (which by the way would only be GPT 5.5 and not GPT-6 or 7 like many people fantasized about) would still have strange quirks like inserting the word “goblins” in random places, y’all would have called me either “crazy” or “a hater” or both.
“Scaling”, you would have shouted. “Deep learning is conquering walls!”, you would have said.
And yet here we are.
OpenAI can’t even align their systems well enough to get them to stop talking about goblins without putting a bunch of utterly hack-y goblin-specific crud in their system prompts like (and I am not making this up) “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query."
Meanwhile, this nonsense varies by “persona”. An actual quasi-scientific report on their website reports, without humor, “Across all datasets in the audit, the Nerdy personality reward showed a clear tendency to score outputs to the same problem with “goblin” or “gremlin” higher than outputs without, with positive uplift in 76.2% of datasets.”
Instead of actual computer science, we are left with alchemy.
Might as well be chanting magic incantations.
Good luck solving AI safety with this tech. 🙄