6/ For adaptation, we used LoRA (rank 32, alpha 32, dropout 0.1) on the Qwen 3.5 backbone plus a retrieval projection layer on top.
This gives us a lightweight way to specialize the model for retrieval without full fine-tuning. Trained on 8 A100s with an effective batch size of 512.