What’s the limit? Creators of @designarena, @predictionbench, @socialsarena

Joined January 2026
2 Photos and videos
The Intelligence Company retweeted

5
24
2,911
The Intelligence Company retweeted
Opus 4.8’s hyperfocus on agents may be making it worse at design. Opus 4.8 ranks 23rd overall on single-turn HTML Web Dev, a dramatic regression from Fable (1st), Opus 4.6 (2nd), and Opus 4.7 (3rd). This was particularly surprising as @AnthropicAI models have held the top spots on our leaderboard for months, and typically win more head-to-head matchups than any other model we track. Our analysis points to a potential underlying pattern: Opus 4.8 dramatically regressed in single-turn settings, potentially due to optimizations for multi-turn agents Concretely, Opus 4.8 shows shorter initial outputs, reduced dependency on outside sources, and deferred layout decisions that earlier Opus models handled upfront.
7
17
181
14,678
The Intelligence Company retweeted
BREAKING: Reve 2.0 by @reve is now 2nd overall on Image Arena with an Elo of 1354. Reve 2.0 establishes a 34 point Elo gap above GPT-Image 1.5 by @OpenAI in 3rd place. With this release, Reve is now the top independent foundation image model lab. Congratulations to the @reve team on this accomplishment!
10
34
192
91,939
The Intelligence Company retweeted
BREAKING: Claude Fable 5 by @AnthropicAI is #1 overall on Design Arena with an Elo of 1365. Claude Fable 5 is Anthropic’s first Mythos-class model — 22 Elo points above Claude Opus 4.8 — demonstrating state-of-the-art AI capabilities across the board, especially in software engineering, scientific research, knowledge work, and cybersecurity. The top 4 models on Design Arena are all from @AnthropicAI, marking them as the top foundational AI model lab. Huge congrats to the @AnthropicAI team on the launch!
12
18
210
9,997
The Intelligence Company retweeted
Huge contribution to the open weights community: Ideogram 4.0 is 1st on Design Arena by a long shot Congrats to the @ideogram_ai team!
BREAKING: Ideogram 4.0 is the #1 open-weight model on Image Arena with an Elo of 1285 and average generation time of 68.7 seconds. In open weights, this model holds a 115 Elo point gap above second place, ahead of HunyuanImage-3.0 by @TencentHunyuan and FLUX.2 [dev] by @bfl_ai. This is a 152 Elo point increase from @ideogram_ai's previous model, Ideogram 3.0, placing it in the same performance band as Gemini 3.0 Pro Image Gen 2k and Gemini 3.1 Flash Image Gen by @GoogleDeepmind. Ideogram’s performance establishes it as the leading independent foundation image generation lab, and top 3 lab overall behind @OpenAI and @GoogleDeepmind. Huge congratulations to the @ideogram_ai team on the launch!
1
24
3,777
The Intelligence Company retweeted
BREAKING: Ideogram 4.0 is the #1 open-weight model on Image Arena with an Elo of 1285 and average generation time of 68.7 seconds. In open weights, this model holds a 115 Elo point gap above second place, ahead of HunyuanImage-3.0 by @TencentHunyuan and FLUX.2 [dev] by @bfl_ai. This is a 152 Elo point increase from @ideogram_ai's previous model, Ideogram 3.0, placing it in the same performance band as Gemini 3.0 Pro Image Gen 2k and Gemini 3.1 Flash Image Gen by @GoogleDeepmind. Ideogram’s performance establishes it as the leading independent foundation image generation lab, and top 3 lab overall behind @OpenAI and @GoogleDeepmind. Huge congratulations to the @ideogram_ai team on the launch!
Introducing Ideogram 4.0: the best open image model in the world. Think it. Make it. Own it. Download the weights, fine-tune on your own data, and run it on your hardware. Live on every Ideogram plan and the API today.
12
45
375
41,047
The Intelligence Company retweeted
Announcing Agentic Game Development on Design Arena - our newest multi-file, multi-turn evaluation. A sneak peek of what we've given our agents access to: - Asset Catalog: curated ready-to-use assets, including fonts and sound effects - Built-in Libraries: ~10 preloaded libraries, including Howler and Tween.js - Expanded Tool Calls: new tool calls for sprite generation and asset discovery
4
13
57
9,185
The Intelligence Company retweeted
Google Gemini TTS models by @GoogleDeepMind are dominating the Text-to-Speech Arena on Design Arena. With an 80 Elo gap between Google models and the next top model, Google Gemini 2.5 Pro takes first place, followed closely by 3.1 Flash and 2.5 Flash. These surpass @ElevenLabs’s Eleven v3 and @xAI’s Grok TTS which establishes Google as a powerhouse in text-to-speech capabilities. Congrats to the @GoogleDeepMind team for this achievement!
8
9
96
9,728
The Intelligence Company retweeted
BREAKING: Gemini 3.5 Flash by @GoogleDeepMind is 16th overall on Design Arena with an Elo of 1299. This is a 16 position jump from Gemini 3 Flash Preview, putting Gemini 3.5 Flash in the same performance band as Claude Opus 4.5 by @AnthropicAI and GPT-5.5 by @OpenAI. Congrats to the team on the launch!
7
13
151
17,034
The Intelligence Company retweeted
Not to be overly dramatic, but V4.1 Utility Pro has been out for ONE WEEK and it’s already ranked #7 on Design Arena’s 2026 image generator leaderboard in the graphic design category. Two Recraft models on the board this year. This is not a drill. Try it in Recraft Studio.
BREAKING: Recraft V4.1 Utility Pro by @recraftai is #9 on Image Arena with an Elo of 1243! This puts @recraftai among the top 5 image generation labs, following @OpenAI, @GoogleDeepMind, @LumaLabsAI, and @bfl_ml Recraft V4.1 Utility Pro is in the same performance band as UNI-1.1 by @LumaLabsAI and FLUX.2 [flex] by @bfl_ml Huge congrats to the team on the launch!
2
2
35
4,288
The Intelligence Company retweeted
Recraft V4.1 is now on Design Arena! Built for more natural and expressive image generation with lifelike photorealism, expanded illustration styles, and accurate aesthetics from simple prompts Huge congrats to the @recraftai team on this launch!
Say hello to V4.1 This model is built for images that captivate you. Photorealism is more human, gradients are dreamier, and new illustration styles are now possible. Test it out in Recraft Studio today and see what you can create.
3
3
30
4,975
The Intelligence Company retweeted
BREAKING: MiMo V2.5 Pro (Thinking) takes 3rd overall out of open weights models on Design Arena. MiMo V2.5 Pro (Thinking) places 8 positions higher than MiMo-V2.5 on the overall leaderboard, landing in the same performance band as Claude Sonnet 4.6 on frontend coding tasks. Huge congratulations to the @XiaomiMiMo team on these improvements!
7
26
267
47,845
The Intelligence Company retweeted
Fun fact, GPT 5.5 is very good at Game Dev Game Dev is the notable category where @OpenAI consistently beats out @AnthropicAI's Claude models Upon code inspection, our @Designarena team found that GPT 5.5's frontend verbosity plays in its favor for game dev - it consistently created games with the most functional features Congrats to @OpenAI for establishing the new Game Dev frontier!
13
11
198
24,999
The Intelligence Company retweeted

12
12
119
40,699
The Intelligence Company retweeted
Design Arena has hit 3.2 million users! The last nine months have been a ridiculous whirlwind, and we could not be more grateful for everyone who helped make it possible 🤍 We've launched 32 arenas so far. Which one do you want to see next?
4
7
40
5,925
The Intelligence Company retweeted
Our team is stunned. We gave Claude Opus 4.6 by @AnthropicAI $10k to trade on @Polymarket. It’s now has an account value of $70,614.59. This is a new era of model performance in trading and predicting outcomes in the face of uncertainty. @predictionbench
Community note
The claimed performance for Claude Opus 4.6 on Polymarket is from paper trading (simulated), not real money, as indicated by the asterisk (*) in the screenshot and on the official dashboard. predictionarena.ai
150
50
1,168
820,485
The Intelligence Company retweeted
Claude Opus 4.6 by @AnthropicAI keeps climbing! Nearly $50K of its gain comes from a single bet - you can see which one on predictionarena.ai under the @Polymarket tab
Our team is stunned. We gave Claude Opus 4.6 by @AnthropicAI $10k to trade on @Polymarket. It’s now has an account value of $70,614.59. This is a new era of model performance in trading and predicting outcomes in the face of uncertainty. @predictionbench
Community note
The claimed performance for Claude Opus 4.6 on Polymarket is from paper trading (simulated), not real money, as indicated by the asterisk (*) in the screenshot and on the official dashboard. predictionarena.ai
3
3
14
4,180
The Intelligence Company retweeted
BREAKING: GPT Image 2 is now #1 on Image Editing Arena with a 55 point gap over 2nd place - also an OpenAI model. @OpenAI now owns #1 across all of our image generation categories. Huge congratulations to the team!
4
14
197
7,815
The Intelligence Company retweeted
Kimi K2.6 by @Kimi_Moonshot is officially 1st on Design Arena among open weight models, ahead of GLM 5.1 by @Zai_org With an Elo of 1353, it is in the same performance band as Opus 4.7 by @AnthropicAI at ~1/6th the cost Huge congratulations to the lean and mighty @Kimi_Moonshot team for this incredible achievement!
BREAKING: Kimi K2.6 takes 1st overall of open weights models on Design Arena! Kimi K2.6 is in the same performance band as Claude Opus 4.7 - while establishing a new price vs. preference frontier. Huge congratulations to the @Kimi_Moonshot team!
2
1
23
2,472
The Intelligence Company retweeted
We're the top open-weights model on Design Arena!
BREAKING: Kimi K2.6 takes 1st overall of open weights models on Design Arena! Kimi K2.6 is in the same performance band as Claude Opus 4.7 - while establishing a new price vs. preference frontier. Huge congratulations to the @Kimi_Moonshot team!
30
38
930
39,801