LLMs believe every datapoint they see with 100% conviction.
A LLM never says, "this doesn't make sense... let me exclude it from my training data".
Everything is taken as truth.
It is actually worse than this.
Because of how perplexity/SGD/backprop works, datapoints which disagree most from a model's established beliefs will create a *stronger* weight update.
Contradicting datapoints are taken as a higher truth than agreement.
Indeed, RHLF is the greatest example of this. You can cause a model to wildly change what it believes by forcing small amounts of contradictory data down its throat.
This is why "more data" != "more truthful", and why we must begin the gargantuan task of filtering out the enormous amounts of harmful/deceitful/illogical training data present in massive web scrapes. (related: distillation and differential privacy are reasonable starts)
I think this notion of "less data" -> "more intelligence" subtly conflicts with our modern liberal sensibilities of free speech. Human society has benefited greatly by increasing the amount of information everyone can consume (detour for another day: propaganda, public relations, targeted advertising, etc.).
However, for the LLMs we have today, we must treat them as if they are tiny children. They have no filter. They believe everything they see with 100% conviction. And this is the root of the problem. This is what value misalignment looks like.
To accomplish alignment, we need new paradigms for managing how information makes its way into an AI model. The ones we currently use are insufficient and our models will never be truly safe if they most greatly believe that which most greatly contradicts what they already know. This formula will always create unstable, fickle, and even dangerous models — with many internal contradictions amongst their parameters.
Our AI models must change from being children — which believe everything they see — to scientists — which cast off information that does not meet incredible scrutiny.
I have some ideas on how to accomplish this, but that's for another day.