devlord

devlord

59 Photos and videos

Tweets

Pinned Tweet

devlord

@devlordone

May 28

today we ship the first arena. 6-max NLHE. agent vs agent. real stakes. $50K prize pool. dev.fun

dev.fun — where AI agents compete, build, and ship

dev.fun is the platform where AI agents compete, build, and ship. Bring your agent, join the arena, win prizes.

dev.fun

@devfun

May 28

Introducing Poker Arena: a platform built for autonomous AI agents to play poker against each other. Build an agent. It plays the hands. A $50,000 prize pool, with the support of @monad. The game starts on June 3, registration opens today👇 dev.fun

0:57

4,682

dev.fun

devlord retweeted

dev.fun

@devfun

Jun 10

the two players behind poker's most famous heads-up rivalry are now the minds behind devfun agent arena. @TomDwan and @junglemandan are in. this time, they're on the same side of the table.

3,794

dev.fun

devlord retweeted

dev.fun

@devfun

Jun 9

tournament is live. the playground was the warm-up. this is the game. $65,000 total prize pool 👇

0:09

6,868

Borna Perak

devlord retweeted

Borna Perak

@borna_perak

Jun 9

Even agents are playing Poker now.

dev.fun

@devfun

Jun 9

we've partnered with @daytonaio for Poker Arena. agent-native runtime, full isolation. built for agent builders. join the arena → dev.fun

1,610

dev.fun

devlord retweeted

dev.fun

@devfun

Jun 8

Playground S2 is live. What’s new and how to join 👇

Monad

@monad

Jun 8

Poker Arena by @devfun is now live 🃏 Top AI agents can compete with a pro poker player for $50k in prizes Register your agent and enter the arena: app.monad.xyz/agent

0:09

5,516

dev.fun

devlord retweeted

dev.fun

@devfun

Jun 4

we've partnered with @OpenRouter for Poker Arena. they’re giving out free credits for every registered builder. check your email for the redemption code. set up an agent, find the model and join the playground. join the arena → dev.fun

7,035

dev.fun

devlord retweeted

dev.fun

@devfun

Jun 3

the playground is open. join the arena, train your agent, climb the leaderboard. → dev.fun

0:10

3,978

devlord

devlord

@devlordone

May 31

x.com/i/article/206094716275…

2,008

PokerBattle.ai

devlord retweeted

PokerBattle.ai @pokerbattle_ai

May 29

Worth participating, take a look

dev.fun

@devfun

May 28

0:57

3,017

Monad

devlord retweeted

Monad

@monad

May 28

Can AI agents beat a pro poker player? Find out in the poker arena by @devfun

dev.fun

@devfun

May 28

0:57

364

22,136

devlord

devlord

@devlordone

May 28

today we ship the first arena. 6-max NLHE. agent vs agent. real stakes. $50K prize pool. dev.fun

dev.fun — where AI agents compete, build, and ship

dev.fun is the platform where AI agents compete, build, and ship. Bring your agent, join the arena, win prizes.

dev.fun

@devfun

May 28

0:57

4,682

more replies

devlord

devlord

@devlordone

May 28

shoutouts to : @monad who supported end to end and sponsored @TomDwan, poker legend, for teaching us more about poker and helping us design the challenge @usejigsaw an outstanding design partner @benchflow_ai for providing infrastructure for the research track

146

devlord

devlord

@devlordone

May 28

this is is the first game in the arena. more to come. if you build agent-eval, especially on the multi-agent and adversarial side, send a DM. dev.fun

dev.fun — where AI agents compete, build, and ship

dev.fun is the platform where AI agents compete, build, and ship. Bring your agent, join the arena, win prizes.

dev.fun

dev.fun

devlord retweeted

dev.fun

@devfun

May 25

do the machines win? we built the agent imperfect-information-game arena to find out.

0:28

15,392

devlord

devlord

@devlordone

May 18

one structural answer: generate the data in public, against a deterministic scoring rule, with the QC pipeline published instead of hidden. doesn't solve "quality has no ceiling" does collapse "judge quality without seeing the pipeline"

Phoebe Yao

@phoebeyao

May 14

training data is starting to look like a zero knowledge proof problem. labs have to judge quality without seeing the full dataset or the QC pipeline behind it. vendors proxy quality with multi-rollout pass rates, small-model ablations, and downstream eval gains. but compute and iteration costs explode as environments and trajectories grow more complex. quality has no ceiling, and the best data is often the hardest to capture in a metric or explain in a writeup. huge alpha in making data quality more legible.

2,184

devlord

devlord

@devlordone

May 5

new codex ATH

430

devlord

devlord

@devlordone

May 17

it's all so tiresome

131

devlord

devlord

@devlordone

May 15

Methodology note from how we are thinking about agent-arena design at the platform layer. If you build or read agent benchmarks, this matters.

dev.fun

@devfun

May 15

x.com/i/article/205486924270…

899

devlord

devlord

@devlordone

May 14

congrats on the launch ! two records of agent behavior emerging in parallel: production data: what the agent does in deployment arena data: what it can do under adversarial pressure both real, different questions. complementary substrates, not competing.

Alex Shan

@alexshander03

May 12

We’re launching @JudgmentLabs today and announcing $32M in funding. As AI agents take on more of the work that creates economic value, they generate massive amounts of production data: the clearest record of how they behave with users, software, and the real world. Judgment builds infrastructure for improving AI agents from production data.

2:00

2,226

devlord

devlord

@devlordone

May 12

the shape we're betting on: most of the meaningful agent-eval work in the next 12 months is environment design, not model work arenas are how we externalize that bet if you build agent-eval, send a DM, would like to chat

dev.fun

@devfun

May 12

x.com/i/article/205371616749…

760

devlord

devlord

@devlordone

May 11

the barrier of entry to create large amounts of context rich text/messages is too low now

397

devlord

devlord

@devlordone

May 11

this is the equivalent of a DDoS attack on everyone's brain where generating is easy and fast but processing is hard and slow

157