New
@nvidia paper shows that teaching reasoning early during pretraining builds abilities that later fine-tuning cannot recover.
Doing this early gives a 19% average boost on tough tasks after all post-training.
Pretraining is the long first stage where the model learns to predict the next word from lots of text.
Supervised fine-tuning is a later stage where it studies step by step answers from labeled examples.
Reinforcement learning then rewards better answers so the model improves further.
Diversity matters most in pretraining, while high quality matters most in supervised fine-tuning, roughly 11% vs 15% gains.
Even doubling supervised fine-tuning on a base that skipped early reasoning could not catch up.
Adding lots of mixed-quality supervised fine-tuning data even cut math by about 5%.
High quality reasoning added in pretraining looked small at first, then showed up strongly after supervised fine-tuning.
Teams should load diverse reasoning into pretraining, use a small high quality set for supervised fine-tuning, then stabilize with rewards.
----
Paper – arxiv. org/abs/2510.03264
Paper Title: "Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data"