Joined June 2021
103 Photos and videos
Pinned Tweet
Jan 22
We learn to speak before we learn to read. Voice is the most natural interface we have. We just raised a $100M to make building voice AI as easy as a web app.
90
62
725
221,356
Jun 10
Operating a robot over the internet means camera frames and joint state arrive at different times, so your observations drift and training data gets misaligned. LiveKit Portal fuses them back together with the same code, whether the robot's in the next room or another continent.
1
5
35
4,011
We built a live multilingual, multi-person video call with Gemini 3.5 Live Translate on LiveKit. Everyone picks their language, speaks naturally, and hears each other in real time in their language of choice. Watch the demo and check out the open source repo: github.com/livekit-examples/…
Our latest audio model, Gemini 3.5 Live Translate, takes real-time speech translation to the next level for developers by delivering low-latency translation across 70 languages. By processing speech as it streams in near real time, the model enables devs to build low-latency audio experiences with: — Multilingual input: Understands multiple languages in a single session without needing to adjust settings. — Auto-detection: Identifies the spoken language and begins translation instantly. — Native audio processing: Generates more natural-sounding speech that preserves speakers' intonation, pacing, and pitch. — Noise robustness: Filters out ambient noise for clearer conversation in loud environments.
12
11
151
20,652
NVIDIA's Nemotron 3.5 ASR transcribes 40 language-locales from a single 600-parameter streaming model with ~100ms latency. It’s small enough to run on your laptop and drops straight into LiveKit Agents. We built a multilingual teleprompter to show it off. See the full breakdown: livekit.com/blog/nemotron-3.… @NVIDIAAIDev #VoiceAI #NemotronSpeech
4
6
109
5,094
Most voice AI agents forget you the second you hang up. No name, no history, no idea what you asked last time. We gave a LiveKit voice agent persistent memory using @MongoDB Atlas Vector Search. RAG, hybrid rankFusion recall, and a profile that loads before the agent says hello. Full walkthrough and starter kit below.
4
29
2,836
Introducing the LiveKit C SDK. Realtime audio, video, and data tracks for C apps, with the same low-latency transport our other clients use. Built for the C stacks behind robotics, autonomous vehicles, and high-performance media pipelines. livekit.com/blog/livekit-cpp…
2
5
31
1,282
May 27
Honored to be included in @Redpoint’s 2026 InfraRed 100 list alongside the most promising private companies in AI infrastructure. What we’ve built is a reflection of the customers we get to work with, from SAP and OpenAI to thousands of teams shipping voice agents every day. redpoint.com/infrared/100/
2
2
18
1,044
May 22
Congrats to the @cartesia team! Sonic-3.5 just took the #1 spot on the Artificial Analysis Speech Arena and raised the bar for realtime voice generation. It’s live on LiveKit inference today. Try it with a single line of config and ship the most natural sounding agents.
Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500 voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 ( 16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵
6
5
84
8,824
May 21
For AI avatars that feel engaged while your users are speaking, with eye contact, movements, and expressions generated live from a single reference image, check out @runwayml Characters. Add one to a LiveKit voice agent with three lines of code.
9
3
52
5,680
May 20
"Building for enterprise isn't just about having the right AI model. It's about having a stack you can stand behind when a customer calls at 9am on a Monday with a problem." Finn zur Mühlen, Co-founder of telli, on running 30k daily calls on LiveKit @ai_coustics: livekit.com/blog/telli-autom…
4
3
30
1,771
May 19
Ship a voice agent on any website with a single script tag. The widget supports voice, video, screen share, and text chat. Configure branding, capabilities, and per-visitor context from the LiveKit Cloud dashboard. Works on Shopify, Webflow, WordPress, or any custom site.
3
3
58
5,296
May 14
Already built a @LangChain agent? You don't have to rebuild it for voice. With the LangChain plugin for LiveKit Agents, you can connect it to a realtime voice pipeline, complete with speech-to-text, text-to-speech, and the infrastructure to deploy it at scale.
4
2
29
2,980
May 13
Your outbound phone agent has 1-2 seconds to figure out if it's talking to a person, a voicemail, or an IVR. We shipped Answering Machine Detection (AMD) in LiveKit Agents to do that for you so your agent knows when to keep talking, leave a message, use the keypad, or hang up.
4
10
109
8,075
May 12
“Voice makes AI feel less like a tool and more like a natural part of the experience…LiveKit gives us the scalable foundation to bring those voice experiences to life at enterprise scale, without sacrificing flexibility.” Jonathan von Rüden, @SAP's Chief AI Officer.
2
5
28
4,502
Add a face to your voice agent. LiveAvatar by @HeyGen is now supported in LiveKit Agents. Add a realtime human avatar to your agent without rebuilding the conversation loop. Your LiveKit agent still owns the room, turn-taking, model orchestration, and voice pipeline. LiveAvatar renders the synchronized face and video stream. Useful for product demos, onboarding, tutoring, and support agents that need a visual layer.
9
7
96
10,737