🚀 mlx-audio v0.4.4 is out — our biggest model drop yet.
15 new TTS, ASR & VAD models, faster long-form transcription, and an expanded OpenAI-compatible audio server. All running local on Apple Silicon.
🎤 New TTS
• VoxCPM2 — 2B, 48kHz, 30 languages
• MOSS-TTS / TTSD / 1.5
• Higgs Audio v3
• Miso, Dramabox, Irodori-TTS v3 VoiceDesign
📝 New STT/ASR
• Mega-ASR (Qwen3-ASR-1.7B LoRA routing)
• Nemotron 3.5 ASR (streaming)
• granite-speech-4.1-2b-nar, Fun-ASR-Nano
• Cohere ASR — 1.7× faster long-form
🔊 VAD & codecs: Silero VAD, FSMN-VAD, Step-Audio 2
⚙️ Server: OpenAI-compatible response_format, /v1/audio/voices, word timestamps, realtime server-side VAD turns h/t
@lllucas
Huge thanks to all the contributors 🙏
> uv pip install -U mlx-audio
github.com/Blaizzy/mlx-audio