Miso One is 8 billion parameters, open-source, and apparently emotes better than most humans on a Monday morning.
The latency alone at 110ms puts it ahead of a lot of paid alternatives.
This one is going to be everywhere soon.
Today, we’re excited to introduce Miso One, the most emotive voice model in the world.
Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.
We’ve open-sourced the model weights, with API access coming soon.
Hear how Miso One sounds in the thread below.