As someone who spent a lifetime speeding up HFT code… this is a crappy benchmark for “superhuman coding”
Build benchmarks, profile performance, tweak bottlenecks. Repeat again and again and again. No super intelligence needed. Just hours and hours of tedium and persistence
Each time we release a model, we run the same test: give it code that trains a small AI model, ask the new model to speed it up. It takes a skilled human 4-8 hours to reach 4x faster.
In May 2024, Claude Opus 4 averaged a ~3x speedup. This April, Mythos Preview achieved ~52x.