📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality.
The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400 languages) to answer:
🌍Are scaling laws different by language?
🧙♂️Can we model the curse of multilinguality?
⚖️Pretrain from scratch or finetune from multilingual checkpoint?
🔀Cross-lingual transfer scores for 1444 lang pairs?
1/🧵