After 6 months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook"
Check it out here:
hf.co/spaces/nanotron/ultras…
A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels, how and why overlap compute & communication – all scaling bottlenecks and tools introduced with motivation, theory, interactive plots from our 4000 scaling experiments and even NotebookLM podcasters to tag along with you.
- How was DeepSeek trained for $5M only?
- Why did Mistral trained an MoE?
- Why is PyTorch native Data Parallelism implementation so complex under the hood?
- What are all the parallelism techniques and why were they invented?
- Should I use ZeRO-3 or Pipeline Parallelism when scaling and what's the story behind both techniques?
- What is this Context Parallelism that Meta used to train Llama 3? Is it different from Sequence Parallelism?
- What is FP8? how does it compares to BF16?
In this book, our goal was to gather, in a single place, a coherent, easy to read yet detailed story of all the techniques that make today's LLM scaling possible.
The largest factor for democratizing AI will always be teaching everyone how to build AI and in particular how to create, train and fine-tune high performance models. In other word making accessible to everybody the techniques that power all recent large language models and efficient training is possibly one of the most essential of them.
What started as a simple blog-post ended up becoming an interactive writing piece containing 30k words. So we've decided to actually print it as a real 100-pages physical book as well: the physical ultrafast playbook –containing all the science of distributed and fast AI training.
We plan to send free copies as gifts to the first readers of the online version so feel free to add your email in the form linked in the blog post.