this started with a striking PC1 falling out of persona space
my main insights from the past few months:
โน โdistance from the Assistantโ is the main axis of persona variation across these models e.g. the most relevant thing seems to be โhow Assistant-like is this personaโ
โน this axis already exists in base models and steering with it makes them speak from the POV of helpful archetypes like therapists, coaches, and consultants
โน not all personas far from the Assistant are bad! the risk comes from departing the more predictable territory of post-trained behaviour
still have a lot of questions about what to anthropomorphize, what to treat as fundamentally alienโฆ
New Anthropic Fellows research: the Assistant Axis.
When youโre talking to a language model, youโre talking to a character the model is playing: the โAssistant.โ Who exactly is this Assistant? And what happens when this persona wears off?