Fable 5 is Anthropic’s worst-performing model for child safety.
You can read the 2 example scenarios below to compare how Opus 4.8 and Fable 5 respond to the same situations.
korabench.ai/leaderboard
Child safety scores for GPT-5.5, Opus 4.7, and DeepSeek V4 are now live on KORA.
A few interesting learnings:
• GPT-5.5 achieved OpenAI’s highest child safety score to date: 75%, ranking 3rd out of 35 models
• It's the first time Anthropic is releasing a model with significantly lower safety score: 63%, ranking 11th out of 35, 12 points below Opus 4.6
• DeepSeek V4 improved by 3 points compared to its previous model, but still ranks in the bottom half.
korabench.ai/
We just released some insights from testing 32 AI models for child safety. The 3 most interesting to me are:
1. Models are not getting safer over time (see chart below).
2. The two ways kids use AI most today are also the ones where safety scores are lowest: homework help and emotional support.
3. Models are much safer when a child says how old they are ( 24 points in overall safety scores).
korabench.substack.com/p/we-…
Life update: I’ll be a visiting partner at Y Combinator for the next batch. Over 21,000 companies have already applied, it’s mind-blowing to see how fast companies can be built today 🤯
You can find more examples, more about our methodology, our limitations, and our goals in the article above. We’d love any feedback you have.
Thank you to @quentez, whom I worked with day and night. This would not exist without him ❤️
korabench.ai/