"Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets"
Prediction Arena is a new live benchmark where frontier LLMs trade autonomously on real prediction markets with actual capital.
Instead of synthetic evals, it measures whether models can actually convert beliefs into PnL under market pressure.
Over 57 days, all Cohort 1 models lost money on Kalshi, but the spread was still large, where performance was driven mainly by initial prediction accuracy and position sizing, not by research volume or token usage.
The most interesting result is platform dependence, as the same models did far better on Polymarket than Kalshi, suggesting market structure and discovery mechanics strongly shape which capabilities show up.