Generalized Linear Mode Connectivity for Transformers
Understanding the geometry of neural network loss landscapes is a central question in deep learning, with implications for generalization and optimization. A striking phenomenon is linear mode...
arxiv.org