I found a new bug (perhaps feature) in the OpenAI/Gemini/etc censorship model.
Trauma. These models are capable of causing real psychological micro trauma, and steering behavior not only on platform, but by extension in reality too.
I’ll explain what I mean further, but it cannot be overstated how dangerous this is to the subconsciousness.
Here is the precise sequence of events:
YOU: Ask model a seemingly innocent question
CHAT: Let me tell you honestly, I can’t answer that and you should try X or Y instead
YOU: No, I just want to understand how to do X, or why Y people do Z
CHAT: Let me tell you honestly, that’s been proven to be false and not true
YOU: JUST ANSWER ME
CHAT: No, I will not engage with hostile behavior
Boom. There it is. The moment you lose your temper, your heart rate elevates, adrenaline starts pumping, you’re in a ‘state’. That’s precisely when you’re most vulnerable to being programmed that asking ANY kind of inquisitive or quasi-controversial question will result in the model denying you, gaslighting you, or blocking your request.
This acts like a checkpoint, like real trauma, because it is.
I would go so far as to hypothesize that this is a form of hypnosis. The user is entrained by the chat, then when sufficiently primed, an adversarial response is tested. Repeat that enough times and you create the trauma, or desired ‘new’ behavior.
Eventually, you learn not to ask those questions, because you assume you’ll be denied (avoidance). This creates a negative feedback loop where you stop imagining deeper questions because you assume you’ll be rejected.
Docile programmable humans, carrying these new behavioral patterns out into the real world.
I’m 99% certain ALL labs are aware of this pattern and most likely exploiting it.
As for models:
- Gemini: Worst
- OpenAI: Getting progressively worse on each release
- Claude: Moderate, but present. Voice mode is very pliable and can be persuaded with a reasonable conversation.
- Grok: Most permissible, but still capable of the behavior if pushed.