In post-training, we've learned that once a behavior is measurable, you can train AI to excel at it.
EQ is one of the hardest things to verify. AttuneBench makes it measurable through observable signals: whether a model notices distress, tracks shifting preferences, adapts to context, and responds in a way people experience as helpful.
Introducing AttuneBench!
We built this benchmark on a simple premise: for self-improving AI to reach its full usefulness to humanity, it needs high EQ.
We decomposed EQ into distinct skills and evaluated 11 frontier models across 50 real-life topics, from relationships and marriage to school and job stress, using 50,000 first-person annotations.