Joined August 2014
2 Photos and videos
Brandon Yang retweeted
Two new models just dropped 👀 Sonic-3.5 and Ink-2 are the #1 streaming models for text to speech and speech to text
We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.
6
15
45
1,900
We topped the leaderboards for text to speech and speech to text - back to back in a single week! Modeling companies are not about any single model but about building engines for continuous end to end improvement - exciting to see how far we've come!
We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.
9
133
Brandon Yang retweeted
Within the span of a week, we launched streaming TTS (text-to-speech) and STT (speech-to-text) models that topped the leaderboards. I'm incredibly proud of the research team for their relentless pursuit of improvement, which have unlocked new state-of-the-art audio models on the Pareto frontier of speed and quality. As a research problem, speech requires fusing both text and audio and is the gateway to general multimodal models. We built Sonic-3.5 and Ink-2 from the ground up, developing multiple innovations along the way in a direction that will scale to general real-time intelligence. I've personally been deeply involved in building these models and more; it's been a blast working with the incredibly talented research team here @cartesia, and I can't wait to show the world what's coming next :)
We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.
2
8
64
4,797
Brandon Yang retweeted
When @elipughresearch and the team dropped Ink-2, I had to see if this SOTA Speech-to-Text model lived up to the hype. So, I built a dictation app to find out. To my delight, it’s incredibly fast & accurate. You can just "ink it." InkIt is now free and open-source. Try it, fork it, and make it yours! 👇
58
59
104
10,658
Brandon Yang retweeted
Our new model Ink-2 tops AA's leaderboard for streaming speech-to-text! Ink-2 comes with plenty of features optimized for real-time voice agents. With top-class models for both TTS and STT, the team at @cartesia keeps pushing the frontier of models for interactive intelligence.
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
6
22
106
14,600
Brandon Yang retweeted
🧵 on some fun insider details on ink-2 😼
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
2
9
48
6,177
Ink-2 is our first streaming ASR model, built specifically for realtime voice agents - and it's #1 on @ArtificialAnlys on day 1! It's rare for models to be #1 on the first try on a new benchmark, since model development is iterative and there's so much that goes into understanding quality. We've seen great results internally and can't wait for everyone to try it!
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
2
27
885
Brandon Yang retweeted
Sonic 3.5 is now the #1 text to speech model on the @ArtificialAnlys leaderboard! You no longer have to trade off quality and latency - Sonic 3.5 also has the fastest time to first audio at 82ms end to end. See full benchmark results 👇
Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500 voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 ( 16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵
5
21
87
10,955
Sonic 3.5 is now the #1 TTS model on @ArtificialAnlys, an independent benchmark of TTS quality! It's also the fastest model with 82ms end to end latency - it's always been our dream to build realtime voice with no trade-offs. Building great models comes from getting the fundamental right - infrastructure, architecture, data, and evals - and I'm proud to see the hard work for the team recognized!
Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500 voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 ( 16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵
3
5
86
5,472
Brandon Yang retweeted
NYC happy hour with @awscloud Thurs, June 4 · 5:30pm · NY tech week RSVP in comments 👇
1
1
7
2,238
Brandon Yang retweeted
Mamba-3 is out! 🐍 SSMs marked a major advance for the efficiency of modern LLMs. Mamba-3 takes the next step, shaping SSMs for a world where AI workloads are increasingly dominated by inference. Read about it on the Cartesia blog: blog.cartesia.ai/p/mamba-3
3
29
175
73,185
Brandon Yang retweeted
Mar 17
The frontier has increasingly shifted to hybrid models - from Qwen to Kimi-Linear and now with NVIDIA's Nemotron-3 Super - that rely on a strong linear sequence model. Today we release Mamba-3, the most powerful linear model to date. x.com/_albertgu/status/20339…

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!
11
113
842
78,327
Brandon Yang retweeted
The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!
41
310
1,597
447,207
Brandon Yang retweeted
Evo 2, our genome language model that generalizes: - across biological prediction and design tasks, - across all modalities of the central dogma, - across molecular to genome scale, and - across all domains of life, is published today in @Nature.
10
73
385
60,185
Brandon Yang retweeted
Our hackathon kicks off this weekend → $20k in prizes from @AnthropicAI & @cartesia We built the ultimate guide to shipping SOTA voice agents with @ClaudeAI & Cartesia - plus 25 real examples of voice agents taking off 🚀 Reply "Cartesia" or tag a founder/builder for early access👇
5
1
14
2,198
Brandon Yang retweeted
🚀Engineers, startup teams, and product-minded operators, this one’s for you! Join @cartesia × @AnthropicAI at @NotionHQ SF for a 2-day hackathon building voice-first AI agents w/ audio, reasoning, memory & action. 🏆 $20k in credits (Anthropic & Cartesia) 💡 Potential Cartesia enterprise subscription ✅ Early access editorial/social boost via partners Link to apply below ⬇️
5
6
32
6,303
Brandon Yang retweeted
24 Nov 2025
The @cartesia team will be at NeurIPS next week, including @_albertgu and me If you'd like to meet with me (or others on the team), fill the form (below) out and we'll figure out how to get in touch
8
4
107
22,829
28 Oct 2025
Excited to share what we've been cooking 🧑‍🍳
28 Oct 2025
We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -
1
2
33
2,735
27 Oct 2025
I'll be on stage at Techcrunch Disrupt today talking about Cartesia and the future of voice AI. Come chat!
27 Oct 2025
We’re incredibly honored to be named to the 2025 AI #Disruptors60 by Greenfield Partners. We'll be on stage at TechCrunch Disrupt alongside fellow innovators we respect across both AI infrastructure and applications. Chosen from nearly a thousand applicants, this recognition underscores our work in building the next generation of AI: ubiquitous, interactive intelligence that runs anywhere. We’re excited to continue powering developers with real-time voice models that feel human, respond instantly, and unlock entirely new product experiences. 🚀 Special thanks to our friends, customers, and partners for the continued support! #Disruptors60 #GreenfieldPartners #TCDisrupt #AI
2
16
928
Brandon Yang retweeted
16 Oct 2025
At Cartesia, we’re committed to advancing voice AI for the enterprise. That’s why we’re proud to integrate Cartesia’s state-of-the-art voice AI technology with new @ServiceNow AI Voice Agents, part of the company’s recently announced AI Experience. Read more: bit.ly/servicenowcartesia
10
40
10,669