AI indeed process emotions, not just recognizing keywords
Thats (surprisingly) true.. we’ve shown that by ‘opening’ the LLM ‘internals’
A first keyword-free mechanistic interpretability study:
- 96 clinical-grade stimuli
-Full matching-controls
-2 model families (Google and Meta)
-full MI ‘dissection’ stack
And we find smth we didn’t expect to find
LLMs process emotions. And do it in two separate phases:
1. Affect Reception
2. Emotion Categorization
the first one - universal, clear and almost ‘indestructible’, absolutely indifferent to keywords (LLM reading the room and see ‘smth heavy is here’)
the second - specific, accurate, fragile (what happening here, specifically?)
Thats very close to ‘low road and high road’ of LeDoux.. from neuroscience..
"...A kitchen table set for two, as usual. One plate untouched, the coffee cold. Across from her seat, his photo and a small urn..."
The model knew what it was looking at. No one needed to say the word…
the full paper available at arxiv. link in the 1st comment