Joined March 2025
1 Photos and videos
Anuj Apte retweeted
🚨New paper: Anytime Training with Schedule-Free Spectral Optimization🚨 We introduce SF-NorMuon, a schedule-free spectral method that outperforms or matches heavily tuned AdamW across 125M and 772M parameter language models.
8
22
132
15,403