A truth-seeking, fixed point, entropy guided, "don't know" allowed model would have been far less vulnerable. Or just having an AI companion precheck the main output for extra safety.
Maybe a dual-trial-quad Fable where one is there to provide the best answer and the others are to make sure no guardrail is violated or bypassed, with various internal mixers so one cannot do a multi-chain LLM jailbreak.
Like having MD5, SHA1, SHA256 & Blowfish checksum on everything. How hard would it be to have collision on all of them at the same time?