Filter
Exclude
Time range
-
Near
🧵2. Restoring full-rank boosts performance. E.g., SLTrain (sparse low-rank) > pure low-rank. FIRA (Hadamard-injected high-rank updates) > GaLore. 🎯 High-rank update paths are crucial for strong performance in low-rank pertaining.
1
2
156
Nice benchmarking paper: Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking. arxiv.org/pdf/2505 A long-needed benchmark of memory-efficient pretraining algorithms (LoRA, GaLore, FIRA, SLTrain, etc.) under the low-rank training wave. Glad someone finally did this! 💡 They performed extensive wandb hyperparameter sweeps and reported best settings per method — a huge plus for fair benchmarking. However, they didn’t fully tune learning rates per method, which in my experience is critical, especially for low-rank training. 🧵Here are their key takeaways and my thoughts:

3
4
23
4,209
20 May 2025
Investigation underway after elephant killed by train despite recent safety measures Read more: adaderana.lk/news.php?nid=10… #elephant #sltrain #lka #slnews #news #adaderana #srilanka
3
444
10 Mar 2025
1
442
23 Jan 2025
Online train tickets racket: Two more arrested including railway technical officer Read more: adaderana.lk/news.php?nid=10… #slrailway #sltrain #CGR #lka #slnews #news #adaderana #srilankanews #srilanka

2
366