There's a "Goldilocks zone" in quantization where you balance size, speed, and accuracy.
Next, I tried fine-tuning the model using intronhealth/afrispeech-dialog. Turns out it only had 49 samples, basically just 3 training batches for a 769M parameter model.