a phase transition for intelligence at 1.5B params (after model distillation and through Reinforcement Learning)
Quick ablations on CountDown:
Base model quality is key:
We run Qwen-2.5-Base 0.5B, 1.5B, 3B to 7B. 0.5B guess a solution and stop. From 1.5B, the model start learning to search, to self-verify and to revise its solutions, enabling them to achieve much higher scores.