Excited to share our latest work on improving LLM pre-training! 🚀 The amazing
@yuzhaouoe et al. found that focusing on how pre-training sequences are composed and attended over can significantly improve the generalisation properties of LLMs on a wide array of downstream tasks, such as RAG, Knowledge-Intensive Tasks, In-Context Learning, Language Modeling, and much more! Check our pre-print, "Analysing The Impact of Sequence Composition on Language Model Pre-Training",
arxiv.org/abs/2402.13991
Our approaches leverage intra-document causal masking and concatenation of related documents, which have the effect of reducing interference from unrelated texts while improving the pre-training dynamics and generalisation properties. We also propose a novel retrieval-based pre-training sequence construction method, refining the model's ability to learn from context and retain knowledge effectively.
All details are available in our paper, "Analysing The Impact of Sequence Composition on Language Model Pre-Training":
arxiv.org/abs/2402.13991