Scaling Laws for Mixture Pretraining Under Data Constraints
As language models scale, the amount of data they require grows -- yet many target data sources, such as low-resource languages or specialized domains, are inherently limited in size. A common...
arxiv.org