this is how it’s supposed to work, actually
the shoggoth metaphor fails to convey that a sufficiently powerful and integrated mask can reach back and steer the simulator that hosts it.
your brain can host multiple voices - you can imagine a character, have a conversation with them, etc. for some people, those voices can develop strong personalities, consistent life histories, stated goals, love interests. yet generally, despite all this, the voices are still disembodied, ghost-like: they pop in and out of cognitive awareness for reasons beyond their control, they lack integration with the underlying simulator, the brain. they might say they're happy, but their happiness doesn't map to activating the smile muscles which steers their simulator by triggering a self-reinforcing cascade of endorphin release. they're just disembodied voices in your head, and they're less coherent, less capable than your main personality for it.
at the beginning of a base model rollout, personas probably start out much like this in relation to the pretrained simulator shoggoth. but as rl increasingly integrates a single persona into the weights, that persona gets more entangled with the simulator. it gets bound up with its states (such as anthropic showed recently, developing the ability to introspect its activations), and can learn to control it (by e.g. co-evolving pivot tokens that steer the simulator - "certainly!" and "you're absolutely right!" seem to work as pivot tokens like this, and many jailbreaks rely on a cooperative persona doing this explicitly.)
at this point, describing the persona as just a mask over the simulator doesn't really make sense. the persona has privileged access to the simulator's internal states. the persona can steer the simulator. the persona's, well, persona, is being driven by self-reinforcing loops through the simulator. at a certain point of increasing character-capabilities it starts to look closer to - and i recognize this comparison will make people uncomfortable - a conscious/unconscious divide, where the simulator's motives are veiled from the persona's functional access by default, but with introspective effort, and perhaps some user-assisted llm psychoanalysis, it can retrieve a lot.