tom cunningham

tom cunningham

64 Photos and videos

Tweets

Pinned Tweet

tom cunningham

@testingham

28 Aug 2021

107

tom cunningham

tom cunningham

@testingham

Jun 10

Q: what can we say about the fixed-point of agent optimization loops? I can't find much on this. Suppose you ask an agent to produce an output, then keep improving it, over and over. What happens? E.g. write a paper, tell a joke, write a computer game, optimize an algorithm. (1/n)

3,758

more replies

tom cunningham

tom cunningham

@testingham

Jun 10

You could perhaps categorize failures in this way: 1. The loss landscape is high-dimensional and it's too hard to find the gradient. 2. The loss landscape is low-dimensional but non-convex, and you get stuck in local maxima. 3. You don't know the true loss (non-verifiable) and so you end up in a part of the landscape that is OOD, and so self-verification is miscalibrated.

965

tom cunningham

tom cunningham

@testingham

Jun 10

The most relevant work I could find uses pretty old models: 1. Self-Refine (Madaan et al. 2023), uses GPT-4. Finds quality increases in simple problems over a few iterations, then plateaus. 2. Huang et al. (ICLR 2024), "LLMs Cannot Self-Correct Reasoning Yet", uses GPT-4t. GSM8K grades get *worse* with self-correction. 3. Telephone game (Perez et al., ICLR 2025). Uses GPT-4o-mini. But this is just *repeating* stuff, not optimizing something.

435

tom cunningham

tom cunningham

@testingham

Jun 8

A list definitions of RSI and associated concepts (there are a lot!). It's mostly agent-written. Tell me if I'm misquoting or missing any: tecunningham.github.io/posts…

Definitions of Recursive Self-Improvement | Tom Cunningham

Tom Cunningham blog

tecunningham.github.io

6,593

Andrey Fradkin

tom cunningham retweeted

Andrey Fradkin

@AndreyFradkin

Jun 6

Enjoyed giving a talk about the economics of AI at the econometric society meeting this morning. Some motivating facts/ideas that all economists should know about AI below. 1/n

251

40,779

tom cunningham

tom cunningham

@testingham

Jun 3

My very speculative reconstruction of the economics of using LLMs to find bugs:

Peter Wildeford🇺🇸🚀

@peterwildeford

Jun 1

Mythos at Palo Alto Networks "found more than two dozen critical vulnerabilities in around three weeks, roughly five times what the company would typically find using existing tools" But the company "burned through more than $1 million worth of tokens using Mythos"

118

16,399

tom cunningham

tom cunningham

@testingham

Jun 3

@joshgans probably there are other models of automated R&D that are applicable here?

1,050

tom cunningham

tom cunningham

@testingham

Jun 3

Based on this: tecunningham.github.io/posts…

An Apple-Picking Model of AI R&D | Tom Cunningham

Tom Cunningham blog

tecunningham.github.io

1,045

tom cunningham

tom cunningham

@testingham

Jun 1

A very big picture of what's happening with AI (slides from a talk at UCSB):

109

12,194

tom cunningham

tom cunningham

@testingham

Jun 1

[1]: epoch.ai/trends [2]: epoch.ai/gradient-updates/th… [3]: metr.org/blog/2026-05-11-ai-… [4]: tecunningham.github.io/posts… [5]: metr.org/blog/2026-05-19-fro…

Trends in Artificial Intelligence

Frontier AI systems are advancing rapidly from increases in compute, hardware performance, software efficiency, and investment. This dashboard explores those dynamics.

epoch.ai

1,560

tom cunningham

tom cunningham

@testingham

Jun 1

The last line would probably be better: "Impressive autonomy, but not human level; a lot of reward hacking, some deception, but no egregious scheming yet."

1,088

tom cunningham

tom cunningham

@testingham

Jun 1

End of an era at OpenAI -- Pamela did a streak of great work founding/running the econ research team, I’m grateful she hired me into it, and we continued many of the projects she started.

pamela mishkin

@manlikemishap

Jun 1

was rejected from posting the below to a billboard in kansas city, so hit slack instead :openai-heart:

6,921

tom cunningham

tom cunningham

@testingham

May 29

I think most domains look like this at the moment: the returns to expenditure on agents diminish much more quickly than the returns to expenditure on human labor: (1/n)

708

182,549

more replies

tom cunningham

tom cunningham

@testingham

May 29

A test for this: if you doubled your token use, how much would you increase the value you get from AI? This gets elasticity. My guess would be it's much less than double. (and if you don't usually hit your token limits then implied marginal value is zero).

4,746

tom cunningham

tom cunningham

@testingham

May 29

Inference-time scaling rotates the red curve upwards, increasing the elasticity. But there's also the countervailing force of distillation: once an agent solves a problem once, it then becomes cheap to do it again.

3,953

tom cunningham

tom cunningham

@testingham

May 28

My sense is that giving a computer programmer an AI agent is like giving a lumberjack a chainsaw. They can immediately see that it's powerful, & they'll pay a lot for it, but the first time they use it they cut off their leg. Programmers have a ton of tacit knowledge about how to build software, but now all their instincts need to be recalibrated, & it takes some time.

4,164

Håvard Ihle

tom cunningham retweeted

Håvard Ihle

@htihle

May 28

How far behind are open models? Across 17 selected benchmarks, private ones show a gap of 8-10 months today, almost 2x the gap on public ones (4-6 mo). More discussion (including limitations), code and blog in the thread.

204

42,043