This is a big deal.
Most of the voice AI community says that speech-to-speech models aren’t ready for production use cases.
Two reasons:
1. Reliability (more hallucinations)
2. Cost (s2s models are expensive)
On (1), Grok’s Voice Agent API is already running at large scale across Grok’s apps, in Tesla vehicles, and in call centers. There’s more work to do, of course, but progress is being made quickly.
On (2), you get SOTA performance for $0.05/minute, which meets or beats aggregate cascade model (STT LLM TTS) pricing.
Excited to partner with
@xai on this launch — you can build a custom Grok Voice Agent with workflows, tool calling, the whole shebang, in a few lines of
@livekit code.
The future of speech-to-speech is bright!
Today, we're excited to launch the Grok Voice Agent API, empowering developers to build voice agents that speak dozens of languages, call tools, and search realtime data.
x.ai/news/grok-voice-agent-a…