You cannot improve what you cannot measure.
LibriSpeech is 1,000 hours of clean English audiobook narration, but real voice products deal with noisy rooms, dialects, code-switching, and non-English speech.
That is why SONAR introduces the PSDN Score: a composite metric that combines WER, CER, and semantic similarity to evaluate whether a transcript preserves both the words and the meaning.
Voice AI has an evaluation problem. Models look strong on public benchmarks, then collapse on real-world audio.
Introducing
sonar.psdn.ai: a recipe-driven evaluation framework for low-resource languages, real-world audio, and production failure modes.
Details ↓