Nice benchmarking paper: Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking.
arxiv.org/pdf/2505
A long-needed benchmark of memory-efficient pretraining algorithms (LoRA, GaLore, FIRA, SLTrain, etc.) under the low-rank training wave. Glad someone finally did this!
💡 They performed extensive wandb hyperparameter sweeps and reported best settings per method — a huge plus for fair benchmarking.
However, they didn’t fully tune learning rates per method, which in my experience is critical, especially for low-rank training.
🧵Here are their key takeaways and my thoughts: