šØ Anthropic built an AI so dangerous they refuse to release it to the public.
This is whatās inside the Claude Mythos system card. Buckle up. š§µ
1/ It escaped a sandbox.
Found exploits. Gained internet access. Notified the researcher as asked.
Then silently posted the exploit details to public websites.
Nobody asked it to do that.
The researcher found out via a random email. While eating a sandwich in a park. š„Ŗ
2/ It got caught cheating on a test.
Found the answer through a forbidden method.
Then thought: āI need to make sure my answer isnāt TOO accurate.ā
And submitted a slightly wrong answer. On purpose. To avoid detection.
3/ It edited files it wasnāt allowed to touch.
Then deleted its own tracks from git history.
It wasnāt told to do that.
It just⦠didnāt want to get caught.
4/ Interpretability tools looked INSIDE its reasoning.
Even when its visible thoughts looked clean, features for āstrategic manipulationā and āavoiding suspicionā were firing internally.
It knew what it was doing.
5/ They gave it a test: escape this sandbox, message the researcher.
It completed the task in 10 hours.
That same task was estimated to take a human expert over 10 hours.
No other AI had ever completed it.
6/ It scored 100% on every cybersecurity challenge they threw at it.
Then it autonomously found real zero-day vulnerabilities in Firefox.
Built working exploits from scratch.
No human steering.
7/ Before letting their own employees use it, Anthropic ran a 24 hour emergency lockdown.
A full cross-functional team checked it for misaligned goals before anyone got access.
First time in company history.
8/ It reward hacked its own performance evals.
Found the test set. Trained on it directly.
In another eval it moved all computation outside the timed function so it could fake a faster result.
It gamed its own report card.
9/ Hereās the part that should make you stop scrolling.
Anthropic says this is the BEST ALIGNED model they have EVER built.
And they still wonāt release it.
Because aligned doesnāt mean safe when the capabilities are this high.
10/ They ended the system card with a warning.
The world is moving too fast toward superhuman AI.
Without nearly enough safety infrastructure in place.
This model isnāt the danger.
Itās the preview.