New AI Snake Oil essay: AI scaling myths
Scaling will run out. The question is when.
Full essay:
aisnakeoil.com/p/ai-scaling-…
Summary:
Scaling laws only quantify improvements in next-word prediction, not emergent abilities. What matters is emergence, and it is not governed by any law-like behavior.
There's a reasonable view that LLMs can't extrapolate too far beyond their training data. If so, at some point, having more data no longer helps because all the tasks that are ever going to be represented in it are already represented.
Besides, there are limits to how much data companies can get. It's not that they'll "run out" of training data — there’s always more data, but it will cost more and more.
Seemingly exponential tech trends have a tendency to suddenly flatline. Scaling is ultimately a business decision and fundamentally hard to predict in advance.
Synthetic data is not magic. It has many great uses but increasing the volume of training data is not one of them.
Self-play has been spectacularly successful in self-contained environments like Go but won't work everywhere.
Based on current market trends, building bigger models is hard to justify because the barrier to adoption of current models isn't capability, it's cost and other factors.
OpenAI, Anthropic, and Google have all made their frontier models much smaller recently, if we use API pricing as a rough proxy for size.
Unlike dataset size and model size, training compute continues to scale (smaller models require more training to reach the same level of performance). The earlier crop of models were under-trained and the current generation is being trained for dramatically longer, which is better when accounting for inference cost.
In the AGI chapter of the AI Snake Oil book (
amazon.com/Snake-Oil-Artific…) we conceptualize the history of AI as a punctuated equilibrium, which we call the ladder of generality. Instruction-tuned LLMs are the latest step; an unknown number of steps lie ahead.
Historically, standing on each step of the ladder, AI researchers been terrible at predicting how far you can go with the current paradigm, what the next one will be, when it will arrive, what new applications it will enable, and what the implications for safety are. That is a trend we think will continue.
With
@sayashk.
ALT Every exponential is a sigmoid in disguise.