MLX port wins or dies on whether the vocoder runs on-GPU, not just the acoustic model — that's where M-series latency hides. What's peak unified memory with a voice loaded, and real-time factor (audio-sec/wall-sec) on an M-series mini, fp16 vs 4-bit?