I’ve never been convinced by SparseMoE models. They sound appealing, but are plagued with fundamental technical issues.
When Joan and Carlos shared their SoftMoE idea and results, for the first time, I thought “this may be it!”
Not a panacea, but still feels like a breakthrough
Introducing Soft MoE! Sparse MoEs are a popular method for increasing the model size without increasing its cost, but they come with several issues. Soft MoEs avoid them and significantly outperform ViT and different Sparse MoEs on image classification.
arxiv.org/abs/2308.00951