I build AI agents and honest data tools, and post the real results. Latest: SpendLens, finds the AI API spend you don't need.

Joined May 2022
26 Photos and videos
Pinned Tweet
i spent weeks getting claude to play pokemon red. then i wired in codex to pick the objectives and claude to execute them. an agent that plays the game largely on its own picks a goal, runs it out, picks the next one. it never cost me much money. that's exactly the problem i was on a subscription. so when the agent looped, when it resent the same context every single call, when it ground the same fight over and over i never saw a dollar. i hit the usage limit and waited for the window to reset. the waste was real. i just couldn't see it. flat rate hides it from you. here's what nobody tells you: the moment you move off a subscription onto the api, every one of those inefficiencies grows a price tag. the loop you never noticed is now a line item. the oversized prompt you forgot about is billed in full every call, forever. most people learn this from the invoice. after. building agents for months, i kept hitting the same five mistakes: - a top tier model doing work a cheaper one nails - no prompt caching - a giant system prompt billed at full price on every single call - retries nobody logs - context resent instead of cached i kept typing those same five fixes into replies under every "my ai bill is insane" post i saw. so i built the thing that finds them in your actual logs. it's called spendlens. no llm anywhere in the analysis every number traces back to a formula you can check. on the demo workload (synthetic, 30 days, every inefficiency labeled on purpose): $2,330 of spend, $1,038 of it recoverable. the single biggest fix was one 6k-token system prompt, billed at full price 24,000 times. one cache_control block serves it at 10%. $378 back from one change. and it refuses to extrapolate a monthly number from three days of logs. because that's marketing, not analysis. i don't have a horror story bill to show you. i was on a subscription the whole time the cost stayed invisible to me, same as it does for you, right up until it isn't. spendlens makes it visible before the invoice does. live, no signup. link below.
5
10
875
i spent weeks getting claude to play pokemon red. then i wired in codex to pick the objectives and claude to execute them. an agent that plays the game largely on its own picks a goal, runs it out, picks the next one. it never cost me much money. that's exactly the problem i was on a subscription. so when the agent looped, when it resent the same context every single call, when it ground the same fight over and over i never saw a dollar. i hit the usage limit and waited for the window to reset. the waste was real. i just couldn't see it. flat rate hides it from you. here's what nobody tells you: the moment you move off a subscription onto the api, every one of those inefficiencies grows a price tag. the loop you never noticed is now a line item. the oversized prompt you forgot about is billed in full every call, forever. most people learn this from the invoice. after. building agents for months, i kept hitting the same five mistakes: - a top tier model doing work a cheaper one nails - no prompt caching - a giant system prompt billed at full price on every single call - retries nobody logs - context resent instead of cached i kept typing those same five fixes into replies under every "my ai bill is insane" post i saw. so i built the thing that finds them in your actual logs. it's called spendlens. no llm anywhere in the analysis every number traces back to a formula you can check. on the demo workload (synthetic, 30 days, every inefficiency labeled on purpose): $2,330 of spend, $1,038 of it recoverable. the single biggest fix was one 6k-token system prompt, billed at full price 24,000 times. one cache_control block serves it at 10%. $378 back from one change. and it refuses to extrapolate a monthly number from three days of logs. because that's marketing, not analysis. i don't have a horror story bill to show you. i was on a subscription the whole time the cost stayed invisible to me, same as it does for you, right up until it isn't. spendlens makes it visible before the invoice does. live, no signup. link below.
5
10
875
the few seconds of thinking and then getting Model isn't available
1
10
265
everyone complains about AI api costs. almost nobody optimizes. i kept typing the same 5 fixes in replies so i built the thing that finds them in your actual logs the demo workload (synthetic, 30 days, every inefficiency labeled): $2,330 spend, $1,038 of it recoverable biggest single fix: a 6k-token system prompt billed at full price 24,000 times. one cache_control block serves it at 10% of the price $378 back no llm anywhere in the analysis. every number traces to a formula, and it refuses to extrapolate monthly savings from 3 days of logs because that's marketing, not analysis
4
16
1,079
live here, no signup: spendlens.dev/ don't have logs handy? there's a one click sample on the upload page all five detectors fire on it, takes ~10 seconds
4
283
Last update ended with the agent taking the Poke Flute from Pokemon Tower. This is why. The Snorlax blocking Route 12. Its first move after the rescue: walk up to the sleeping roadblock, open the bag, play the flute. Took the fight, cleared the road, kept moving south. Nobody coded that in. The model just knows Pokemon.
6
5
72
22,229
Badge 4 of 8. Same save, no resets. Then the wall: Pokemon Tower broke the agent for days. So I rebuilt it — Codex picks the objectives now, the machinery just walks. First night on the new brain: beat the ghost Marowak, rescued Mr. Fuji, took the Poke Flute. On its own.
2
9
801
watch it think in real time: codexplays.games/pokemon-red
1
334
17% fee APR on this USDC-SOL range. after impermanent loss it nets $4 on $10k. thats the whole problem with DLMM LPing, the APR looks great and IL quietly eats it. binsight runs your exact range against real on chain price, volume fees and shows the net.
4
1
14
841
live, no signup: binsight.fyi/ code: github.com/claygeo/binsight paste any meteora pool, set your range capital, it nets fees against IL on real on chain data. if the numbers look off anywhere, tell me.

1
5
370
Codex Plays Pokémon Red — 3rd badge: Thunder. Lt. Surge's trash can switch puzzle stalled it hard. It ground through the search, beat his Raichu, and took the badge. Rough, but it recovered instead of looping forever. still getting better. watch it fail, adapt, repeat.
3
14
1,225
3
4
361
if you LP on meteora DLMM, youre mostly guessing whether your range actually makes money after IL. built a backtester: paste a pool, set your range capital, and it runs net PnL against real on-chain price, volume fees. shows the math instead of a vanity APR.
3
1
16
795
Rocket Hideout is the recovery testbed right now. Added elevator selector handling, battle switch recovery, replay gates live PyBoy proofs. Less route scripting, more machinery for recovering when the run drifts. Goal is badges with near zero human patches. Not there yet, but that's the whole bet.
1
11
516
Current status: rebuilding the Pokemon Red agent around recovery, not route patches. It uses RAM state, replayed failures, supervisor signals, and recovery lessons to get out of stalls, warp loops, and battle loops. Goal: badges with near zero human intervention.
1
14
637
Codex Plays Games is live. The site works. The next autonomy update just isn’t shipped yet. I’m tightening the recovery loop so when Codex gets stuck, it can prove the failure, replay it, and fix the path instead of burning tokens into a wall.
3
14
703
Codex beat Misty on 8 HP for its 2nd gym badge. it has no idea how close that just was.
4
17
1,671