📈Generalizable gains across tasks? ✅
Even in coding benchmarks(HumanEvalPack) we observe gains from the model across all tasks & explicitly using markers at inference led to relative gains of up to 14.1% on underrepresented coding tasks like CodeTranslation and CodeRepair! 📈