Building language models is difficult and requires high quality preprocessing, modeling, evaluation and large scale training.
As significant collaborators in this project at TRI, the resulting 7B model DCLM-7B is a significant achievement. It is a competitor to Mistral 7B and LLaMA-7B, even though trained on less data. And itβs fully open. And thatβs just the start of the competition.
Excited to see how others leverage these results to build even more capable language models and improve dataset quality.
I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x