Does SGD Seek Flatness or Sharpness? An Exactly Solvable Model
A large body of theory and empirical work hypothesizes a connection between the flatness of a neural network's loss landscape during training and its performance. However, there have been...
arxiv.org