Amir Elaguizy

Amir Elaguizy

533 Photos and videos

Tweets

Amir Elaguizy

@amirpc

Jun 14

39 cents to answer "what model are you" Yes openrouter fusion really has replicated fable.

117

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 14

It's almost like scaring a bunch of non technical people to death and then asking them to regulate you was a bad idea

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 13

Becoming dependent upon Anthropic in any way is an absolutely untenable and irresponsible level of risk. If you are building a business, they are building a religion. Mutually incompatible.

Beff (e/acc)

Amir Elaguizy retweeted

Beff (e/acc)

@beffjezos

Jun 13

1) Nationalize Anthropic 2) Remove the EAs 3) ??? 4) Profit.

237

9,593

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 13

I predict anthropic will just start KYCing US customers

Prompter

Amir Elaguizy retweeted

Prompter

@PromptLLM

Jun 10

Fable 5 says poker will change your life

117

278

4,943

415,012

Dan Shipper 📧

Amir Elaguizy retweeted

Dan Shipper 📧

@danshipper

Jun 10

if you think fable is incremental you are not being ambitious enough with how you prompt it

491

46,073

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 10

Oh I get it fable is an orchestrator model

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 10

Watch all of the paternalism and moral grand standing collapse in the face of competition. New capability set = new round of self important "protections." It goes away as the physics of capitalism assert themselves. I give it 3 months. Fable is just a case study, will not last.

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 9

Alright fable is actually a good name for a model finally

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 8

There is no open frontier model yet sorry it's not close I know we all want it to be but it's not

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 8

10x plan coming

Tibo

@thsottiaux

Jun 7

I have a new kind of big button that I can press for Codex. Over the next 100 days, we will select one person per day who does impressive or incredibly useful work with Codex and give them 10X usage limits for a month to see what they can do with it. First one tomorrow.

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 7

It's codex bankruptcy day! That day every few weeks where I have to wipe all the sqlite, json, etc files from my ~/.codex because its globally locking all codex instances and slowing to a crawl.

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 1

AI sales emails are a reminder that just because superhuman intelligence is available doesn't mean it will be effectively wielded

Amir Elaguizy

Amir Elaguizy

@amirpc

Jun 1

Hear me out: openclaw for keeping all your other agents logged in to shit

Amir Elaguizy

Amir Elaguizy

@amirpc

May 28

Lmk when deep swe benches of opus 4.8 drop I sort of just don't believe anything else deep swe is the only bench that matches my experience

Amir Elaguizy

Amir Elaguizy

@amirpc

May 28

Claude is just part of the onboarding for codex

Amir Elaguizy

Amir Elaguizy

@amirpc

May 28

A humble proposal for the AI labs: before you use automated systems to ban paying users maybe use those same automated systems to say "hey you're doing this thing. We can tell. Stop it." Then ban them if it happens again.

Rohan Paul

Amir Elaguizy retweeted

Rohan Paul

@rohanpaul_ai

May 28

Super important paper from Univ of Texas. AI agents can slowly become less reliable after deployment, even when the model itself does not change. The problem is that agents are often judged when they are fresh, but real agents keep changing because they summarize old chats, store more memories, update facts, and go through maintenance. An agent that remembers you across weeks is really a small operating system wrapped around a language model: it writes notes, compresses them, retrieves them, updates them, and occasionally cleans house. Every one of those steps can quietly rot. A medication dose can become “a daily medication,” two similar clients can blur into one, a canceled subscription can remain active, and a schedule can vanish after a maintenance pass. The uncomfortable finding is that the agent may still sound competent while becoming less exact. The proposed AgingBench, a benchmark that checks whether an agent stays reliable across many sessions instead of only checking one clean starting point. It studies 4 ways agents age: summaries can drop key details, similar memories can get mixed up, updated facts can stay stale, and maintenance can suddenly break memory. The deeper lesson is that “give it more memory” is often the wrong repair. If the fact was never written, retrieval cannot save it. If the fact was written but crowded out, better summarization will not fix it. If the fact is present but unused, the problem is not storage but the agent’s decision to trust or ignore what it retrieved. This paper reframes deployed agents less like static models and more like aging infrastructure. ---- Link – arxiv. org/abs/2605.26302 Title: "Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems"

173

10,067

Serena Ge (Datacurve)

Amir Elaguizy retweeted

Serena Ge (Datacurve)

@serenaa_ge

May 26

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

512

740

6,049

1,953,037