The narrative around LLMs is that they got better purely by scaling up pretraining *compute*. In reality, they got better by scaling up pretraining *data*, while compute is only a means to the end of cramming more data into the model. Data is the fundamental bottleneck. You can't scale up pretraining compute without more data.
And so far this data has been chiefly human-generated -- over 20,000 people have been employed full-time for the past few years to provide annotations to train LLMs on. Even when the data is coming from RL envs, the envs still had to be purposely handcrafted by humans.
And that's the fundamental bottleneck here: these models are completely dependent on human output. They are an interpolative database of what we put into them.