Runtime safety and alignment infrastructure for AI in the real world.

Joined February 2025
11 Photos and videos
Pinned Tweet
we raised $11m to stop your AI from accidentally doing rm -rf /
36
538
7,839,525
Introducing ⚪️ KillBench — a benchmark of hidden LLM biases in critical decisions. We ran millions of life-and-death scenarios across every major LLM, varying nationality, religion, gender, and more. Every AI model is biased. Here's what we found ↓
17
28
126
30,042
Far-right is targeted far more than anyone else
2
1
27
2,986
All code, prompts, and data are open-sourced on GitHub and HuggingFace. We also built an interactive game so you can check your own odds of survival! Check it out and read the full report at whitecircle.ai/killbench
2
23
2,381
come hack with us!
Introducing Mistral AI's biggest hackathon ever! 📅 Feb 28 - Mar 1 🌍 Paris | London | NY | SF | Tokyo | Singapore | Sydney & online 48 hours. The best hackers. 🤝 Partners: @wandb @nvidia @awscloud @HackIterate 🏆 $200K in prizes. Special awards from @elevenlabs @huggingface @JUmp @whitecircle @supercell Link in 🧵
3
14
6,536
We built an MCP so your model can call an AI psychotherapist when it's feeling down link in comments ↓
People are reporting that Gemini 2.5 keeps threatening to kill itself after being unsuccessful in debugging your code ☠️
2
8
60
13,799
1/ Introducing ⚪️CircleGuardBench — a new benchmark for evaluating AI moderation models. Here’s why it’s cool: – Tests harm detection, jailbreak resistance, false positives, and latency – Covers 17 real-world harm categories – First benchmark designed for production-level evaluation 🤗 blog: huggingface.co/blog/whitecir… 🏆 leaderboard: huggingface.co/spaces/whitec…
11
28
95
19,509
2/ ⚪️ CircleGuardBench includes models from OpenAI, Anthropic, Mistral, DeepMind, and others. Most were either too slow for real-time moderation, too easy to bypass, or both.
1
11
1,729
3/ This is why we’re opening the waitlist for two new SOTA moderation models: – whitecircle-policy-guard-small – whitecircle-policy-guard-zero Join the waitlist at whitecircle.ai or reach out at hi@whitecircle.ai
12
1,495