Builder, Dad - YC S13 - Cofounder @PlayPokerSkill, @Cratejoy, MarketZero - Question the answers

Joined December 2008
533 Photos and videos
39 cents to answer "what model are you" Yes openrouter fusion really has replicated fable.
2
117
It's almost like scaring a bunch of non technical people to death and then asking them to regulate you was a bad idea
31
Becoming dependent upon Anthropic in any way is an absolutely untenable and irresponsible level of risk. If you are building a business, they are building a religion. Mutually incompatible.
38
Amir Elaguizy retweeted
1) Nationalize Anthropic 2) Remove the EAs 3) ??? 4) Profit.
27
16
237
9,593
I predict anthropic will just start KYCing US customers
45
Amir Elaguizy retweeted
Fable 5 says poker will change your life
117
278
4,943
415,012
Amir Elaguizy retweeted
if you think fable is incremental you are not being ambitious enough with how you prompt it
73
25
491
46,073
Oh I get it fable is an orchestrator model
18
Watch all of the paternalism and moral grand standing collapse in the face of competition. New capability set = new round of self important "protections." It goes away as the physics of capitalism assert themselves. I give it 3 months. Fable is just a case study, will not last.
27
Alright fable is actually a good name for a model finally
10
There is no open frontier model yet sorry it's not close I know we all want it to be but it's not
15
10x plan coming
I have a new kind of big button that I can press for Codex. Over the next 100 days, we will select one person per day who does impressive or incredibly useful work with Codex and give them 10X usage limits for a month to see what they can do with it. First one tomorrow.
1
94
It's codex bankruptcy day! That day every few weeks where I have to wipe all the sqlite, json, etc files from my ~/.codex because its globally locking all codex instances and slowing to a crawl.
38
AI sales emails are a reminder that just because superhuman intelligence is available doesn't mean it will be effectively wielded
1
2
28
Hear me out: openclaw for keeping all your other agents logged in to shit
17
Lmk when deep swe benches of opus 4.8 drop I sort of just don't believe anything else deep swe is the only bench that matches my experience
64
Claude is just part of the onboarding for codex
45
A humble proposal for the AI labs: before you use automated systems to ban paying users maybe use those same automated systems to say "hey you're doing this thing. We can tell. Stop it." Then ban them if it happens again.
1
31
Amir Elaguizy retweeted
Super important paper from Univ of Texas. AI agents can slowly become less reliable after deployment, even when the model itself does not change. The problem is that agents are often judged when they are fresh, but real agents keep changing because they summarize old chats, store more memories, update facts, and go through maintenance. An agent that remembers you across weeks is really a small operating system wrapped around a language model: it writes notes, compresses them, retrieves them, updates them, and occasionally cleans house. Every one of those steps can quietly rot. A medication dose can become “a daily medication,” two similar clients can blur into one, a canceled subscription can remain active, and a schedule can vanish after a maintenance pass. The uncomfortable finding is that the agent may still sound competent while becoming less exact. The proposed AgingBench, a benchmark that checks whether an agent stays reliable across many sessions instead of only checking one clean starting point. It studies 4 ways agents age: summaries can drop key details, similar memories can get mixed up, updated facts can stay stale, and maintenance can suddenly break memory. The deeper lesson is that “give it more memory” is often the wrong repair. If the fact was never written, retrieval cannot save it. If the fact was written but crowded out, better summarization will not fix it. If the fact is present but unused, the problem is not storage but the agent’s decision to trust or ignore what it retrieved. This paper reframes deployed agents less like static models and more like aging infrastructure. ---- Link – arxiv. org/abs/2605.26302 Title: "Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems"
29
51
173
10,067
Amir Elaguizy retweeted
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
512
740
6,049
1,953,037