Filter
Exclude
Time range
-
Near
Love papers like RootFree (RF). It's well-motivated, well-written, has decent ablations, and most importantly, is a simple yet elegant method. The key idea of RFAdamW seems to be it moves away from the notion that "2nd-order statistics scale the per-parameter learning rate" and instead embraces their curvature information. This way, they close the generalization gap between SGD and AdamW without compromising early convergence. As the implementation is a one-line change, I've added RF and grafted RF to github.com/ClashLuke/schedul…. Grafted RF (following openreview.net/forum?id=FpKg…) allows the direct transfer of tuned SFAdamW hyperparameters to see immediate gains without retuning. See below for an incomplete ranking on a toy problem: x.com/LinYorker/status/18130…
16 Jul 2024
#ICML2024 Can We Remove the Square-Root in Adaptive Methods? arxiv.org/abs/2402.03496 Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW) Removing the root makes matrix methods faster: Root-free Shampoo in BFloat16 /1
2
8
34
3,183
@HHSLiverpool there's no better feeling that having your colour done #RootFree thank you ! ❤️ my colour xxx
1
1
19 Dec 2015
Thanks to Robbie @bluetitlondon my hair is now a perfect shade of pale #blonde #rootfree #peckham
1
1
5 Sep 2015
@TeejayOuellette finally😈 #rootfree
ILU @TheLoftToronto, thanks for blonding me OUT! 🙋🏼💁🏼🙆🏼💇🏼 #rootfree #blondetourage
1
1
8 Jan 2015
Ooh I do love my fresh hai. Thanks Pat at @TerencePaul #manchester #blonde #Hair #rootfree
1
1
2
Getting your hair done is probably one of the best feelings #rootfree 💁👏
1
2