Anthropic's Fable 5 safety filter did not work.
It blocked things that were completely vanilla, seemingly at random.
Then, it could be easily bypassed, which I ended up doing just to get the vanilla things working.
Imagine being one of the world's top tech companies, screaming about safety, and then releasing a completely broken safety mechanism.
Does a hypercompetent tech company that for years has been shouting from the rooftops about safety and national security release a completely incompetent safety mechanism?
Does it release a broken safety mechanism on this model after being warned by the U.S. government not to release the model, because it is unsafe?
Does that same company, when the U.S. government points out that the safety mechanism is broken, respond to publicly downplay the problem?
One would think that Anthropic would have dedicated a cracked team to get the safety mechanism working properly.
But apparently it didn't.
It is so bad and so dumb that it feels like it must have been on purpose.
I'm not saying it was on purpose.
But how can such a cracked company whose core identity is AI safety create such a broken solution -- and then play down how broken it is?
I am trying to understand.