Artificial Analysis

Artificial Analysis

272 Photos and videos

Tweets

Leonard retweeted

Artificial Analysis

@ArtificialAnlys

Jun 12

We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.

110

186

1,912

543,226

brayden petersen ⁂

Leonard retweeted

brayden petersen ⁂

@bmptrsn

May 30

WEEEEEEEEEEE

0:08

Datacurve

@datacurve

May 30

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

0:08

21,162

Datacurve

Leonard retweeted

Datacurve

@datacurve

May 30

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

0:08

120

1,764

971,593

Neesh 🥭

Leonard retweeted

Neesh 🥭

@Neesh774

May 27

The new @datacurve site is insane

0:32

512

28,750

Theo - t3.gg

Leonard retweeted

Theo - t3.gg

@theo

May 26

This is the first code bench that actually aligns with how it feels to use these models coding.

Serena Ge (Datacurve)

@serenaa_ge

May 26

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

120

158

3,636

304,040

Leonard

Leonard

@LeonardMainnet

May 26

Worked on acquiring and processing these tasks! Hope this provides unique, and valuable insights on how to better evaluate models in a sea of benchmarks

Serena Ge (Datacurve)

@serenaa_ge

May 26

19,374

Serena Ge (Datacurve)

Leonard retweeted

Serena Ge (Datacurve)

@serenaa_ge

9 Oct 2025

Today we’re announcing we’ve raised $17.5 million in funding across a $15M Series A led by Chemistry and a $2.7M Seed to accelerate foundation model progress through providing frontier training data for LLMs. When we first started Datacurve, it came from a simple realization: foundation model progress is limited not just by compute, but by data quality and complexity. The right data unlocks new capabilities, especially in coding, where accuracy and reasoning matter most. We’re now proud to partner with the world’s leading foundation-model labs, providing them with high-quality, complex training data that helps push the boundaries of what AI can do. This is still just the start. Come build the future of technology with us in San Francisco: datacurve.ai/careers Huge thanks to our incredible team and investors who’ve believed in us since day one and beyond: @garrytan at @ycombinator, @1vnzh from @cohere , @Mark_Goldberg_ from @chemistry_fund, @TheDerrickLi from @AforeVC, @forwarddeploy, @SoheilK, and @shyamalanadkat.

150

1,063

449,229

caps

Leonard retweeted

caps

@capsjpeg

22 May 2025

think i need more degods

106

5,302

Pasta

Leonard retweeted

Pasta

@pastagotsauce

22 May 2025

launched v1 of Profiles. get in discord and let us know what features to add next :)

224

21,956

we5lie

Leonard retweeted

we5lie

@We5lie

22 May 2025

DeGod Profiles V1 is live. Designed alongside with @budaz__ — excited to share what we’ve built for the community.

231

18,519

adil

Leonard retweeted

adil

@adilcreates

22 May 2025

Profiles v1 is live. Designed by @We5lie & @budaz__ — animated by me. Massive props to @LeonardMainnet, @0x_chill & @thebasedbob for engineering magic. Grateful to help bring this to life for the community.

4,757

BeByDay

Leonard retweeted

BeByDay

@BeAlterEgos

12 May 2025

Topped it up to 50 DeGods Fuck it, we ball @DeGodsNFT

BeByDay

@BeAlterEgos

12 May 2025

Swept 20 DeGods Art slaps, always will. And Pasta cracks me up. @DeGodsNFT

244

11,802

kramrogNL (33.3%)

Leonard retweeted

kramrogNL (33.3%)

@kramrogNL

5 Apr 2025

I actually dropped a y00ts coloring book on Amazon recently! Please help spread the word 🙏 a.co/d/axYbcJD

0:11

133

164

674

131,462

de_doug.sol

Leonard retweeted

de_doug.sol

@de_dougdotsol

22 Mar 2025

It has always been @DeGodsNFT.

2,534

dream (outlaw) 🏴‍☠️

Leonard retweeted

dream (outlaw) 🏴‍☠️

@dreamoutlaw888

22 Mar 2025

degods

2,655

we5lie

Leonard retweeted

we5lie

@We5lie

21 Mar 2025

red cap

4,325

we5lie

Leonard retweeted

we5lie

@We5lie

16 Mar 2025

degods

185

7,214

we5lie

Leonard retweeted

we5lie

@We5lie

13 Mar 2025

most typed word since 2021

124

6,110

we5lie

Leonard retweeted

we5lie

@We5lie

4 Mar 2025

104

5,179

Pada

Leonard retweeted

Pada @PadawanGng

4 Mar 2025

I am now a @DeGodsNFT owner. Do DeGods follow DeGods?

325

20,916