Low-latency real-time speech-to-text, text-to-speech and translation APIs.

Joined March 2022
62 Photos and videos
Pinned Tweet
Soniox v5 Async is live. Our new async speech-to-text model turns real-world audio into more accurate, structured speech data. What’s improved: • Higher accuracy across 60 languages • Completely reengineered speaker separation for identifying who said what • Improved language identification for multilingual and accented speech • Better recognition and formatting of numbers, dates, emails, IDs, codes, names, and addresses • More robust context usage for names, domain vocabulary, product terms, and custom phrases stt-async-v5 is fully compatible with the existing async API. Just update the model name. Read more: soniox.com/blog/soniox-v5-as…
10
5
39
870,333
The easiest way to try Soniox Async v5 in your code: use our Python or Node SDK. Call transcribe_and_wait_with_tokens, wait, read the audio transcription from the result. Done.
1
4
253
Google now has Gemini Live Translate.
 Soniox has Real-World Live Translate.
1
3
23
755,496
Soniox shows its performance already on simple audio input. Once you throw in IDs, numbers, emails, addresses, and actual hard speech, the accuracy gap just grows bigger. A broken speech recognition layer makes the rest of the pipeline fall apart, and a laggy service amplifies it. Your voice agents deserve a speech system that does not fall apart.
話題のGemini 3.5 Live Translateを少し前に話題になった、GPT-Realtime-Translateと私が自作アプリで使っている圧倒的コスパのSonioxと比較テストしました。 結論:GPT不安定、Geminiさすが、Sonioxすごい。ASRの速度と精度がこの中でいちばんに見える。 ただ、私の声をマイク音声で音声アウトプットもパソコンのスピーカーからやったので全然本来の力を発揮できていない可能性もあります😅 また真面目な比較テストをしたいと思います。
8
1,096
Stop overpaying for speech AI. Compare your bill across providers with our new pricing calculator. soniox.com/compare#calculato…
2
5
65
2,742,678
A rap that goes through the whole alphabet and speeds up with with progress. Absurd in the best way. Soniox Speech-to-Text keeps up - you can be the judge of how accurately. Source: youtube.com/watch?v=RvLmcRZt…
9
339
You usually have a rough idea of which languages will show up in your audio. Pass them to Soniox as language_hints parameter and the model biases toward them for better accuracy. It stays fully multilingual underneath, so if someone slips into a language you didn't list, it still gets transcribed.
1
3
217
Sometimes a hint isn't enough. A strong accent can push the model into the wrong language and you get text in the wrong alphabet. When you know the audio is only ever one language, set language_hints_strict to true to keep it pinned there. Works best when you restrict to a single language.
1
2
173
And when you need to know which languages were actually spoken, turn on enable_language_identification. Every token comes back tagged with its language, so you can see exactly where a conversation crossed from one to another. Useful for routing or running analytics on multilingual calls. Read more about language settings in our docs: soniox.com/docs/stt/concepts… soniox.com/docs/stt/concepts… soniox.com/docs/stt/concepts…
2
116
If you've sent us feedback and never heard back, it still landed. We read all of it. What you tell us shapes a lot of what goes into the next models. v4 wouldn't be anywhere near as good without the direct feedback we got from our users. soniox.com/speech-to-text
4
148
Happy to see @telnyx added Soniox to their stack powering global communications. 🌎 We handle the hard parts of realtime STT: code-switching, account numbers, names in noisy calls. Try us out with Telnyx voice AI agents.
Jun 3
Voice AI teams now have another STT option on Telnyx. We just added @soniox_ai STT for real-time transcription workflows. This matters because STT is one of those pieces you only notice when it gets things wrong. If the caller switches languages, says a product name, gives an account number, or talks over background noise, the rest of the agent is only as good as the transcript it receives. With Soniox now available on Telnyx, teams building voice agents get another model to test alongside the rest of their voice stack. This is useful for multilingual agents, mixed-language calls, and workflows where names, codes, and domain terms matter. You can read more about it here: telnyx.com/release-notes/son…
1
9
457
Building a voice AI app? One of the hardest parts is knowing exactly when a person has finished speaking. Wait too long and your assistant feels sluggish. React too early and you cut people off mid-thought. This is what endpoint detection solves. How it works: ⬇
2
6
328
A clean pattern for voice apps: Show non-final tokens instantly for live captions. Rerender with final tokens once <end> arrives. Trigger your actions after. Your UI will feel instant while your logic stays accurate.
1
2
90
Endpoint detection is a small flag that makes a big difference in how natural your voice AI feels. Full docs and examples here: soniox.com/docs/stt/rt/endpo…
3
82