🚀 Stop over-optimizing same-tokenizer distillation.
On-Policy Distillation is powerful, but it quietly assumes teacher and student share the same tokenizer.
What if they don’t?
We introduce SimCT, a simple way to recover lost supervision for cross-tokenizer OPD.