If you're curious what the "big bet" is right now at frontier AI labs, here's a quick non-technical summary:
The models have effectively been trained on all available human knowledge (internet/books/videos/pictures) for a while now. But we kept making the models bigger to make them better at "predicting the next token". From GPT-1 to GPT-4 it worked incredibly well but has plateaued going from GPT-4 to GPT-4.5.
Along the way, in GPT-3, a technique of incorporating human feedback to show the token prediction engine how to "chat" with us was discovered and that's how ChatGPT was born. There has since been a realization that we might be able to use that same technique to teach the models how to think. (o1, o3, GPT-5 thinking).
What's "thinking"? The model generates a bunch of text hidden from you to show its work. Then it says <DONE_THINKING> after which ChatGPT starts showing you the text it generates. We've even hooked up a computer program to run based on these "thoughts". e.g. If the model thinks "Use Tool, Google: What's the weather today?" a computer program will go and Google the weather and return the results. The model can then incorporate the result of that search into its thoughts. That's what the "reasoning models" are doing, that's why they take so long.
It's been shown the models aren't doing "real reasoning" though, in the human sense. So the idea right now is to get lots of examples of good reasoning and behavior to hopefully teach it how to. So once again a "hypothesis of scaling" but now through behavioral data (coding/computer work) and thinking traces. That data is much harder to get your hands on so there's a bit of a gold rush to figure out how to get lots of it at scale.
It's also an open question: if you train a model on a massive and diverse enough dataset of "problem solving" from countless different environments (math, code, science, logic puzzles), will it eventually stop memorizing specific solutions and start learning the underlying, abstract reasoning operators (e.g., decomposition, deduction, pattern matching)? If so, can that finally create Artificial General Intelligence (AGI), intelligence that can do anything a human can do?
That's why all the labs are creating "coding agents", buying coding companies (Windsurf), and buying digital "environments" and behavioral/thinking traces. Those are treasure troves of complex digitized reasoning and verifiable outcomes. e.g. You can observe all the steps it took an engineer to solve a problem and how they went about it.
So it's an open question. And some top AI researchers think we need a new approach and this won't work. But for now, Occam's razor, we'll see if this approach will create AGI or it'll be on to the next thing.