there’s no incentive for the geometry to form up nicely under the typical training objectives. So they won’t. And then MUVERA and SMVE will break
But with LSH-aware training (or finetuning), like Antoine proposes, the model is explicitly nudged to protect against this issue