The most emotive foundation models for voice.

Joined July 2025
Photos and videos
Miso Labs retweeted
Miso Labs open source voice model is exploding right now because it can generate speech with emotional prosody. Voice has been waiting for its GPT-3 moment, and this seems like it might be it.
Miso Labs just open sourced an 8 billion parameter voice model. 110 millisecond latency. Real emotional range. Voice cloning from a 3 to 10 second sample. Apache 2.0 licensed. ElevenLabs has been the default for emotive AI voice for 2 years. Miso One just made the same capability free. The closed source voice AI moat just got smaller in a single launch.
7
6
84
25,517
Miso Labs retweeted
The biggest unlock in voice AI isn't sounding human. It's making people feel like they're talking to one. Open-sourcing a highly expressive model is a huge win for builders, creators, and developers.
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
6
4
32
8,704
Miso Labs retweeted
Voice AI is entering a new era. For years, the challenge wasn't speaking—it was feeling. An 8B model delivering human-like emotion with 110ms latency is a glimpse of where AI conversations are heading next.
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
4
4
20
3,878
Miso Labs retweeted
i fw this heavy
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
1
2
39
11,125
Miso Labs retweeted
This is a wildly impressive open TTS model. Can't wait to build on this.
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
9
18
517
162,074
Miso Labs retweeted
Amazing team working on amazing things!
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
2
2
11
1,792
Miso Labs retweeted
Model weights for Miso One are available on @huggingface
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
11
7
57
10,831
Miso Labs retweeted
Mimi codec still going strong!
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
3
5
66
6,207
Miso Labs retweeted
Replying to @AodenTeoMT
We're getting closer to truly natural AI interactions.
2
3
697
Miso Labs retweeted
Replying to @AodenTeoMT
110ms response time on an emotive voice model is getting close to real-time conversation. the uncanny valley is closing fast bud
1
4
674
Miso Labs retweeted
Most voice AI systems can replicate words. Very few can replicate feelings. And that tiny gap is what makes all the difference. That's why Miso One stands out.
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
8
3
11
1,616
Miso Labs retweeted
my YouTube Short was due in 12 hours and i still had no voiceover tried Miso One because someone in our editor's discord wouldn't stop posting about it open source, runs locally. typed the script. got the read back instantly not sure when the bar for TTS got this high but i'm not complaining
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
11
3
21
7,284
Miso Labs retweeted
2 Most voice AI can sound correct. Very few can sound alive. The difference is hidden in the final 5% — emotion, nuance, and delivery. That’s where humans decide whether a voice feels real. And that’s why Miso One stands out.
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
3
3
5
13,890
Miso Labs retweeted
What's going on today??? Miso One released! text-to-speech with emotions!
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
1
2
4
516
Miso Labs retweeted
Miso One is 8 billion parameters, open-source, and apparently emotes better than most humans on a Monday morning. The latency alone at 110ms puts it ahead of a lot of paid alternatives. This one is going to be everywhere soon.
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
18
7
30
11,258
Miso Labs retweeted
Jeeeeeeeez I needed this
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
1
3
10
3,899
Miso Labs retweeted
Opensource will always win.
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
3
23
3,952
Miso Labs retweeted
Replying to @AodenTeoMT
Great work on open-sourcing this. Really interested to fine-tune this for some of my applications around research and education. Checking out the repo now! github.com/MisoLabsAI/MisoTT…
1
4
49
27,193
Miso Labs retweeted
What's going on today??? Miso One released! text-to-speech with emotions!
Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.
4
5
68
10,838
Miso Labs retweeted
Miso TTS 8B. Emotive speech and dialogue generation - 7.7B Llama-3.2 backbone 300M autoregressive audio decoder - natural expression and voice continuation - conditions on text and user audio tone - high-quality compression/reconstruction huggingface.co/MisoLabs/Miso…
1
15
96
5,256