PhD student in Machine Learning @ETH_en and @MPI_IS. Working on foundation models for biology 🧬, model merging 🀝, and structured pruning βœ‚οΈ.

Joined May 2024
3 Photos and videos
Alexander Theus retweeted
#ICLR2026 Into mode connectivity, model merging, or permutation invariance? We show how optimization dynamics shape the loss landscape of merged weights. Come check it out! πŸ“… 23/04 10:30AM – 13:00PM πŸ“ Pavilion 3 P3-1809 w/ @TheusResearch @DamienTeney @orvieto_antonio
1
1
6
170
Alexander Theus retweeted
πŸ“’ Submissions are OPEN for the Weight Space Symmetry Workshop @icmlconf! ⏰ Deadline extended β†’ April 30 (23:59 AOE) Consider submitting any work related to weight symmetries: optimization, model merging, weight space learning, and so on! #ICML2026 #weightsymmetry2026
1
6
19
6,067
Alexander Theus retweeted
πŸ“’Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026
3
37
56
21,774
Excited to announce that our paper has been accepted as an Oral at NeurIPS 2025! πŸ₯³
1/ 🚨 New paper alert! 🚨 We explore a key question in deep learning: Can independently trained Transformers be linearly connected in weight space β€” without a loss barrier? Yes β€” if you uncover their rich symmetries. πŸ“„ arXiv: arxiv.org/abs/2506.22712
1
1
12
1,131
1/ 🚨 New paper alert! 🚨 We explore a key question in deep learning: Can independently trained Transformers be linearly connected in weight space β€” without a loss barrier? Yes β€” if you uncover their rich symmetries. πŸ“„ arXiv: arxiv.org/abs/2506.22712
2
8
59
5,700
9/ πŸ”‘ Takeaway: Transformers can be linearly connected β€” but only if you exploit richer network symmetries. We show that general symmetry alignment (not just permutations) unlocks low-loss paths across ViTs and GPT-2.
1
522