Nvidia paper behind Nemotron-Math, a massive math tutoring dataset so smaller Large Language Models can learn long, tool checked reasoning.
It contains 7.5M step by step solutions, some as long as 128K tokens, meaning text pieces, written in 3 reasoning styles.
This dataset also shows self checking, where the model runs Python code to avoid simple arithmetic mistakes.
The authors mix competition problems from Art of Problem Solving with real questions from Mathematics Stack Exchange and MathOverflow.
They use open model gpt-oss-120b as a teacher, generating multiple solutions per problem at high, medium, and low depth.
For long context training, they sort examples by length and fine tune in stages, so most steps use shorter text before 128K.
That schedule gives about 2-3x faster training with roughly 1-3% less accuracy, and the extra Stack Exchange problems make the trained models handle messier questions better.
----
Paper Link – arxiv. org/abs/2512.15489
Paper Title: "Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision"