UPDATE: We were able to replicate the Mythos findings using existing models (GPT5.4)
Writeup coming early next week, no BS prompts, it's real reproduction
I will say it again, we used GPT5.4 and Opus, and we were able to autonomously find zero-days in the Linux Kernel (in the last 3 weeks)
Mythos is probably better at the task of finding potential issues in code, but imo the threshold for "scary" was reached in December or even earlier
This is a great hype machine for Anthropic, especially that they plan to do IPO eoy
I totally agree - this is not a new capability