Let us begin where you would begin if you wanted to create a large language model: with text.
What does the word "text" mean to you? You really should think about this before you keep reading. Note how it can mean many different things--not all text is created alike.
To make a large language model, we need to collect a lot of text, since the text has the language we want to model. You want your LLM to contain regular patterns across language use, and to do so you need to find those patterns to then model.
This was originally done with collections such as "The Pile", and the name is telling: this was not a collection of texts organized as text, but a pile of language use used to find regularities in language.
And it worked.
If you go through the pile, you will find a lot of examples of people saying "I go home" but not "I go house", and so you find a rule in English as it is used. The rule is a grammatical one, and a simple one: "home" here functions as an adverbial locative rather than a direct object. Why is that? I won't get into it here, but it's a long story.
But you'll also see in your model that "I go to the house" or "I go to our house" is used less frequently than "I go home". Why is that? Because there is another, more abstract, and less commonly recognized grammatical rule: in English, we tend to call "our house" a "home" instead of "our house", and there are complicated and fascinating reasons for this I won't get into. But this isn't a grammatical rule about, say, when you use an indefinite article and when you don't.
It's still a grammatical rule, but it's at a much higher level of abstraction, and is a rule about how we express feelings of belonging, relationship, emotion.
LLMs demonstrate a lot of these things we think are personal expressions of emotion are really just publicly certified rules.
That is destabilizing and emotionally debilitating, so I am not surprised a lot of the AI world is trying to ignore this fact. David Foster Wallace was so upset about this fact that he wrote a book about it (no not Infinte Jest).
I won't handhold your feelings through this except to say that there's more to the mind than language.
But as for the language itself, note how the LLM, once it finds a statistical regularity of natural language use (i.e., a rule of natural language use) like "people in English say 'I go home' more than 'I go to my house' or another variant", it can use that to model and predict the likeliest next word in a phrase according to paramaters that can be adjusted to the model, which in turn adjust how the transformer architecture is used. It does this through tokenization and mathematical modeling that people focus on way, way too much. Those details are valuable to understand how LLMs function and how to make LLMs more efficient, but they aren't valuable to understand what LLMs are really doing: predicting language from a base of language that the programmers of the LLM modeled into it.
Remember that "M" in "LLM" stands for model. The large language model is a model of how language is typically used inside of a large text corpus and it is created by modelers. There is no creation being done by the model itself--the model is just an architecture of different possible ways to transform symbols. This is what is meant by "paramters", functionally speaking.
What about emergent thinking and reasoning abilities? Yes, LLMs have these! Because we think by using language, when we model language use we can create a model of thinking. This was the whole idea of LLMs to begin with!
But the problem is the emergent thinking and reasoning abilities in a large language model is really bad. That is why pretraining is insufficient--you get regularities embedded into the model, but the regularities are extremely noisy and chaotic, because natural language use is noisy and chaotic.
So then the next step is constraining the LLM to harness its reasoning. You do this first via posttraining: give it more constrained rules as a kind of overlay on top of the less constrained rules that you introduced in pretraining. This will make the outputs of the LLM less chaotic, less random, and less creative.
You get the LLM to conform to expectations by telling it what you expect.
The LLMs get smarter when you do this, because they get better at rearranging language in outputs to conform to different language games. Train it on a billion math problems and a billion solutions, and you have created many new rules that will help it see the language of the math problems and the language of the math solutions and find syntactic correspondences. In other words, by showing it a ton of problems and solutions you show it ways in which problems and solutions are connected. Then when you give it new problems, yes it can indeed help you find new connections to possible solutions.
In deductive systems like math, I can see how an LLM could close the loop and essentially provide answers to mathematical questions. A lot of people seem to be really impressed with the Erdos problem solutions that LLMs have been spitting out lately, and I do indeed see it as a great tool if you think of math problems like a journeyman: a kind of task that better tools will give better solutions for.
But determining which solutions have value in the real world--which can be used, and in using them can gain meaning beyond the deductive loop of solving the math problem--LLMs are much more limited. And, no, they are not USELESS here (Bender is wrong), but they are not MACHINE GOD either (roon is wrong too--but I think roon knows he's wrong and is just an unreliable trickster).
They're a powerful new tool that will help us achieve 200bps of economic growth above the counterfactual. I'm not sure how this will appear, whether as GDP or productivity growth or something that isn't measured by our current tools. But you'll feel it. I certainly already am.
I have been saying this since 2023.
You will see the market continue to converge toward this view over the next few years--I honestly believe we won't see humanity resolve to my position, which is both reasonable and right, because humanity needs a time of emotional adjustment to new technology.
In the end, this is what people will say: "yeah LLMs were really cool and they made the world richer and more efficient, and took out a lot of the most boring braindead work we used to do just like robots on an assembly line did in the 20th century". You probably will not be able to prove or disprove this prediction of mine until ~2035, so bookmark this and come back to it.