I'm super excited to release DiPaCo, a new kind of mixture of experts, that can scale engineering-wise to data centers across the entire world!
A few words about it in this thread 🧵
Google presents DiPaCo
Distributed Path Composition
Progress in machine learning (ML) has been fueled by scaling neural network models. This scaling has been enabled by ever more heroic feats of engineering, necessary for accommodating ML approaches that require high