4. These are ScionC experiments, designed to keep the weight norm stable.
(I don't expect 2-4 to change the direction of the result)
5. Without biases somehow the avg. spectral norm is smaller and the L2 grad norm is higher. It's possible that the optimal WD may change...
5/5