Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape.