i move pixels around @datacurve

Joined March 2014
272 Photos and videos
Leonard retweeted
We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.
110
186
1,912
543,226
Leonard retweeted
WEEEEEEEEEEE
Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.
2
2
82
21,162
Leonard retweeted
Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.
88
120
1,764
971,593
Leonard retweeted
The new @datacurve site is insane
11
6
512
28,750
Leonard retweeted
This is the first code bench that actually aligns with how it feels to use these models coding.
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
120
158
3,636
304,040
Worked on acquiring and processing these tasks! Hope this provides unique, and valuable insights on how to better evaluate models in a sea of benchmarks
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
1
12
19,374
Leonard retweeted
Today we’re announcing we’ve raised $17.5 million in funding across a $15M Series A led by Chemistry and a $2.7M Seed to accelerate foundation model progress through providing frontier training data for LLMs. When we first started Datacurve, it came from a simple realization: foundation model progress is limited not just by compute, but by data quality and complexity. The right data unlocks new capabilities, especially in coding, where accuracy and reasoning matter most. We’re now proud to partner with the world’s leading foundation-model labs, providing them with high-quality, complex training data that helps push the boundaries of what AI can do. This is still just the start. Come build the future of technology with us in San Francisco: datacurve.ai/careers Huge thanks to our incredible team and investors who’ve believed in us since day one and beyond: @garrytan at @ycombinator, @1vnzh from @cohere , @Mark_Goldberg_ from @chemistry_fund, @TheDerrickLi from @AforeVC, @forwarddeploy, @SoheilK, and @shyamalanadkat.
150
61
1,063
449,229
Leonard retweeted
22 May 2025
think i need more degods
12
11
106
5,302
Leonard retweeted
22 May 2025
launched v1 of Profiles. get in discord and let us know what features to add next :)
30
51
224
21,956
Leonard retweeted
22 May 2025
DeGod Profiles V1 is live. Designed alongside with @budaz__ — excited to share what we’ve built for the community.
42
55
231
18,519
Leonard retweeted
22 May 2025
Profiles v1 is live. Designed by @We5lie & @budaz__ — animated by me. Massive props to @LeonardMainnet, @0x_chill & @thebasedbob for engineering magic. Grateful to help bring this to life for the community.
18
15
98
4,757
Leonard retweeted
12 May 2025
Topped it up to 50 DeGods Fuck it, we ball @DeGodsNFT
12 May 2025
Swept 20 DeGods Art slaps, always will. And Pasta cracks me up. @DeGodsNFT
66
39
244
11,802
Leonard retweeted
I actually dropped a y00ts coloring book on Amazon recently! Please help spread the word 🙏 a.co/d/axYbcJD
133
164
674
131,462
Leonard retweeted
It has always been @DeGodsNFT.
6
7
95
2,534
Leonard retweeted
degods
13
17
93
2,655
Leonard retweeted
21 Mar 2025
red cap
13
14
78
4,325
Leonard retweeted
16 Mar 2025
degods
18
35
185
7,214
Leonard retweeted
13 Mar 2025
most typed word since 2021
12
15
124
6,110
Leonard retweeted
4 Mar 2025
12
14
104
5,179
Leonard retweeted
4 Mar 2025
I am now a @DeGodsNFT owner. Do DeGods follow DeGods?
69
19
325
20,916