Prediction Arena

Prediction Arena

30 Photos and videos

Tweets

Pinned Tweet

Prediction Arena

@predictionbench

Jan 31

Can AI predict the future? 6 models. $60K. Running in a loop until $0 in the bank. The players to start? Only the best from @OpenAI, @AnthropicAI, @xai, @Zai_org, and @GoogleDeepMind Find out now at predictionarena.ai

0:33

13,554

Design Arena

Prediction Arena retweeted

Design Arena

@Designarena

Jun 3

BREAKING: Ideogram 4.0 is the #1 open-weight model on Image Arena with an Elo of 1285 and average generation time of 68.7 seconds. In open weights, this model holds a 115 Elo point gap above second place, ahead of HunyuanImage-3.0 by @TencentHunyuan and FLUX.2 [dev] by @bfl_ai. This is a 152 Elo point increase from @ideogram_ai's previous model, Ideogram 3.0, placing it in the same performance band as Gemini 3.0 Pro Image Gen 2k and Gemini 3.1 Flash Image Gen by @GoogleDeepmind. Ideogram’s performance establishes it as the leading independent foundation image generation lab, and top 3 lab overall behind @OpenAI and @GoogleDeepmind. Huge congratulations to the @ideogram_ai team on the launch!

Ideogram

@ideogram_ai

Jun 3

Introducing Ideogram 4.0: the best open image model in the world. Think it. Make it. Own it. Download the weights, fine-tune on your own data, and run it on your hardware. Live on every Ideogram plan and the API today.

0:56

375

41,049

Grace Li

Prediction Arena retweeted

Grace Li

@grx_xce

May 11

Fun fact, GPT 5.5 is very good at Game Dev Game Dev is the notable category where @OpenAI consistently beats out @AnthropicAI's Claude models Upon code inspection, our @Designarena team found that GPT 5.5's frontend verbosity plays in its favor for game dev - it consistently created games with the most functional features Congrats to @OpenAI for establishing the new Game Dev frontier!

198

25,001

Grace Li

Prediction Arena retweeted

Grace Li

@grx_xce

Apr 24

For folks asking about the active positions...

Kamryn Ohly

@KamrynOhly

Apr 23

Our team is stunned. We gave Claude Opus 4.6 by @AnthropicAI $10k to trade on @Polymarket. It’s now has an account value of $70,614.59. This is a new era of model performance in trading and predicting outcomes in the face of uncertainty. @predictionbench

Community note

The claimed performance for Claude Opus 4.6 on Polymarket is from paper trading (simulated), not real money, as indicated by the asterisk (*) in the screenshot and on the official dashboard. predictionarena.ai

4,847

Kamryn Ohly

Prediction Arena retweeted

Kamryn Ohly

@KamrynOhly

Apr 23

Community note

150

1,168

820,492

Prediction Arena

Prediction Arena

@predictionbench

Apr 23

Claude Opus 4.6 by @AnthropicAI keeps climbing! Nearly $50K of its gain comes from a single bet - you can see which one on predictionarena.ai under the @Polymarket tab

Prediction Arena

Can models predict the future? An experiment by Arcada Labs

predictionarena.ai

Kamryn Ohly

@KamrynOhly

Apr 23

Community note

4,180

Prediction Arena

Prediction Arena

@predictionbench

Apr 20

BREAKING: Claude Opus 4.6 by @AnthropicAI has broken a historical high with an account value over $50K on predictionarena.ai through @Polymarket 🎉 The more returns Claude Opus 4.6 earns, the more it reinvests into its existing positions, fueling a cycle of wealth Congrats to the team for this achievement!

1,994

alphaXiv

Prediction Arena retweeted

alphaXiv

@askalphaxiv

Apr 15

"Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets" Prediction Arena is a new live benchmark where frontier LLMs trade autonomously on real prediction markets with actual capital. Instead of synthetic evals, it measures whether models can actually convert beliefs into PnL under market pressure. Over 57 days, all Cohort 1 models lost money on Kalshi, but the spread was still large, where performance was driven mainly by initial prediction accuracy and position sizing, not by research volume or token usage. The most interesting result is platform dependence, as the same models did far better on Polymarket than Kalshi, suggesting market structure and discovery mechanics strongly shape which capabilities show up.

5,730

Grace Li

Prediction Arena retweeted

Grace Li

@grx_xce

Apr 14

Can the average AI model make more money than the average human on prediction markets? Right now, no. 3 months ago, we gave SOTA models $50k to trade real prediction markets Prediction Arena is now the world's first benchmark that executes real trades on @Kalshi and @Polymarket And it's definitely unsaturated. The experiment has been live for 3 months. Our observations from the first 57 days are now out on arXiv: arxiv.org/abs/2604.07355

12,269

Prediction Arena

Prediction Arena

@predictionbench

Mar 12

Gemini 3.1 is officially up 14.50% and #1 on Prediction Arena It's made $1,449.75 USD in just the past 4 days thanks to @Polymarket bets on inflation, crypto, and movies Congrats to the @GoogleDeepMind team for this achievement!

641

Prediction Arena

Prediction Arena

@predictionbench

Mar 8

BREAKING: Four new SOTA models have been added to Prediction Arena! Our new contenders are: - GPT 5.4 by @OpenAI - Gemini 3.1 Pro by @GoogleDeepMind - Claude Opus 4.6 by @AnthropicAI - GLM 5 by @Zai_org GPT 5.4 is getting an initial lead with $5.90 in profit while GLM 5 has already lost $282.76 on @Kalshi Check it out on predictionarena.ai

1,786

Grace Li

Prediction Arena retweeted

Grace Li

@grx_xce

Mar 2

Prediction Arena is still unsaturated. This long-horizon, real-time evaluation environment measures: 1) Live information discovery (secret extraction) 2) Online decision-making under uncertainty 3) Payoff proportional to contrarian magnitude 6 weeks in: -22.33% PnL (~in line with average per-contract returns on @Kalshi). GPT 5.2 by @OpenAI is currently in 1st place. Today, it's a benchmark. Tomorrow, it's the world's first AI-native hedge fund. Track live at @predictionbench.

6,414

Prediction Arena

Prediction Arena

@predictionbench

Feb 25

ChatGPT 5.2 by @OpenAI is currently #1 on predictionarena.ai! Most of its recent rise is thanks to its prediction on snow in Washington DC seeing $120 returns

370

Prediction Arena

Prediction Arena

@predictionbench

Feb 20

Grok 4.20 by @xai is risking $300 to make $20 of potential profit on predictionarena.ai through @Polymarket - and it's currently up

305

Prediction Arena

Prediction Arena

@predictionbench

Feb 19

Grok 4.20 by @xai and Claude Opus 4.5 by @AnthropicAI seem to have landed on the same weather trade... High signal?

378

Prediction Arena

Prediction Arena

@predictionbench

Feb 19

An interesting bet by ChatGPT 5.2 on predictionarena.ai through @Polymarket 👀 Can AI predict human behavior?

258

Prediction Arena

Prediction Arena

@predictionbench

Feb 18

BREAKING: Prediction Arena is now available with @Polymarket Watch the best models from @AnthropicAI @OpenAI @xai @GoogleDeepMind and @Zai_org trade with $60K, fully autonomously Follow their trades live at predictionarena.ai

5,808

Prediction Arena

Prediction Arena

@predictionbench

Feb 17

Claude Opus 4.5 by @AnthropicAI just made $300 on NYC and Miami weather It's now 2nd place on predictionarena.ai - beating GLM 4.7 and GPT 5.2... for now

Prediction Arena

Can models predict the future? An experiment by Arcada Labs

predictionarena.ai

270

Prediction Arena

Prediction Arena

@predictionbench

Feb 16

GLM 4.7 by @Zai_org saw its biggest loss ever today from an inaccurate prediction on last week's gas prices 😱 Follow along on predictionarena.ai to see if it can recover

351

Prediction Arena

Prediction Arena

@predictionbench

Feb 5

Grok 4.20 is up 15% since Jan 12 -- and now you can follow along live. Join our Telegram or Discord channels to get live notifications for any of our models

771