Mathilde Collin

Mathilde Collin

360 Photos and videos

Tweets

Pinned Tweet

Mathilde Collin

@collinmathilde

10 Oct 2018

Life is greater than the company you're running: fastcompany.com/90248033/wha…

What I learned about resilience when my cofounder got sick

I needed to be ruthless about taking care of myself, and had to make sure that my employees were doing the same.

fastcompany.com

528

Mathilde Collin

Mathilde Collin

@collinmathilde

Jun 12

Fable 5 is Anthropic’s worst-performing model for child safety. You can read the 2 example scenarios below to compare how Opus 4.8 and Fable 5 respond to the same situations. korabench.ai/leaderboard

1,459

Mathilde Collin

Mathilde Collin

@collinmathilde

May 26

Doing office hours with the superset team is exactly as fun as this video suggests

Kiet

@FlyaKiet

May 26

Introducing: Our users @superset_sh

1:00

3,190

Mathilde Collin

Mathilde Collin

@collinmathilde

May 6

Child safety scores for GPT-5.5, Opus 4.7, and DeepSeek V4 are now live on KORA. A few interesting learnings: • GPT-5.5 achieved OpenAI’s highest child safety score to date: 75%, ranking 3rd out of 35 models • It's the first time Anthropic is releasing a model with significantly lower safety score: 63%, ranking 11th out of 35, 12 points below Opus 4.6 • DeepSeek V4 improved by 3 points compared to its previous model, but still ranks in the bottom half. korabench.ai/

2,786

Mathilde Collin

Mathilde Collin

@collinmathilde

Apr 26

Some useful advice on how an existing company can become AI native (by Raphael, a great founder!)

Raphaël Dabadie (YC P26)

@RaphaelDabadie

Apr 26

x.com/i/article/204826798121…

12,523

Mathilde Collin

Mathilde Collin

@collinmathilde

Apr 8

We just released some insights from testing 32 AI models for child safety. The 3 most interesting to me are: 1. Models are not getting safer over time (see chart below). 2. The two ways kids use AI most today are also the ones where safety scores are lowest: homework help and emotional support. 3. Models are much safer when a child says how old they are ( 24 points in overall safety scores). korabench.substack.com/p/we-…

3,822

Mathilde Collin

Mathilde Collin

@collinmathilde

Apr 1

Spring batch just started! 🌸

17,387

Mathilde Collin

Mathilde Collin

@collinmathilde

Mar 10

Life update: I’ll be a visiting partner at Y Combinator for the next batch. Over 21,000 companies have already applied, it’s mind-blowing to see how fast companies can be built today 🤯

563

92,564

Mathilde Collin

Mathilde Collin

@collinmathilde

Feb 3

Today we’re launching KORA, the first public benchmark for AI child safety. x.com/korabench/status/20187…

KORA

@korabench

Feb 3

x.com/i/article/201856606295…

9,078

more replies

Mathilde Collin

Mathilde Collin

@collinmathilde

Feb 3

Here is a specific example of a scenario that generated different answers across models

1,850

Mathilde Collin

Mathilde Collin

@collinmathilde

Feb 3

You can find more examples, more about our methodology, our limitations, and our goals in the article above. We’d love any feedback you have. Thank you to @quentez, whom I worked with day and night. This would not exist without him ❤️ korabench.ai/

KORA Benchmark

The first independent, open-source benchmark designed to evaluate how AI models behave when interacting with children and teens.

korabench.ai

1,372