building realtime avatars @Anam__ai

Joined November 2024
14 Photos and videos
Pinned Tweet
Introducing cara-3, the fastest real-time avatar model on the market. Cara model delivers unmatched realism with sub-180ms response times, setting a new industry standard. 70% of users prefer video over voice. Every pixel is generated in real time, unlocking natural eye movement, micro-expressions, and emotional subtlety so each conversation feels real. Comment "CARA" for 500 free credits.
500
134
1,222
405,737
it appears our latest model has a hidden jack nicholson dimension 🔪
1
1
72
Ben Carr retweeted
Open source Jarvis that runs on a single GPU Today we're releasing the vui stack. A local voice agent that you can chat with in real time, with tools and can run claude to do more complex tasks. Inside this stack is the new vui nano model, a 300M TTS model that can render audio in reply to what you've said and supports a variety of non speech sounds. vui nano speaks with you, not at you. The stack can run on as little as 6GB of vram. Voice cloning supported with prompts of up to 5 minutes. The longer the better. A voice for your openclaw with our v1/realtime endpoint. I have developed this on my own so would love to get the communities feedback and help improving it. Please retweet this so that everyone knows they can have their own private Jarvis
5
12
32
1,854
.@getstream_io x @anam__AI is live: add interactive avatars using the @visionagents_ai framework Read more: anam.ai/blog/vision-agents-a…
6
179
Anam is now integrated with Stream’s Vision Agents 🤙 Stream gives you the realtime multimodal agent framework: calls, state, orchestration, audio/video pipeline. Anam now gives the agent a live avatar in the call. This setup opens the door for "scene switching". The avatar starts on a neutral background, then changes based on the conversation: * ask for a recipe → kitchen * ask about weather → studio * next user turn → back to neutral It’s a relatively small thing technically: intercept the Anam video frames, chroma-key the green screen, and swap in a background based on tool calls / transcript callbacks. But it changes the feel a lot. The agent isn’t just talking over video, its environment can react too. Thanks to the Stream team for leading on the integration! docs: anam.ai/cookbook/vision-agen… cc @visionagents_ai @neevash @Anam__ai
1
1
9
318
Ben Carr retweeted
Is autoresearch really better than classic hyperparameter tuning? We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes better: 🧵(1/6)
26
115
1,307
136,070
Ben Carr retweeted
Huge fans @lennysan and @bcherny at Anam. Enjoyed Boris's recent episode on Lenny's Podcast on the future of coding with AI so much we put together a demo of adding an Anam face to Claude Code.
1
3
4
290
~15% of users hit unstable connections during interactive avatar sessions. Most never told us. The session just quietly got worse. We shipped adaptive bitrate. Every Anam session now adjusts to network conditions in real time.
1
2
6
196
This is table-stakes infrastructure for real-time platforms like Agora and LiveKit. Now it runs on every session. The difference: conversations that used to stutter or freeze now stay smooth. Sessions run longer. Users don't bounce.
1
65
Small change in the stack, big change in the 15% of sessions that needed it most.
49
Anam is part of the relaunched AI Startup Pack by Fin. We're in good company, alongside @ElevenLabs, @Cloudflare, @incident_io, @Attio, and more. Build with interactive avatars that respond in real time, look realistic, and deploy via API. No upfront cost for 7 months.
2
2
13
366
Our pipecat contribution just got merged. Anam is now listed as an official community video integration in the pipecat ecosystem. In case you don't know, pipecat is Daily's open-source framework for building real-time voice agents. 10.5k GitHub stars, used by NVIDIA, Mercor, Descript. We built a video service that takes TTS audio from the pipeline, streams it to Anam over WebRTC, and returns a synchronized interactive avatar face in real time. The avatar speaks, reacts, handles interrupts natively.
1
5
168
Today we're releasing cara-3, our latest face-generation model. In an independent blind study, participants preferred Anam's interactive avatars over other providers across every metric measured. But why do we care about avatars to begin with? x.com/BenCarr630567/status/2…

11
9
38
3,673
We're open sourcing the backbone to our data pipeline. It's called Metaxy and it solves some of the hardest parts of a modern, scalable pipeline. At Anam, we’re building a platform for real-time avatars. One of the core components powering our product is our own video generation model. We train it on custom datasets that require extensive preprocessing of video and audio data. We extract embeddings using ML models, rely on external APIs for annotation and data synthesis, and orchestrate complex multimodal pipelines. Along the way, we ran into significant challenges implementing efficient and flexible sample-level versioning (caching) across these workflows. That experience led us to build and open-source Metaxy — a framework for metadata management and sample-level versioning in multimodal data pipelines. One of our engineers, Daniel, has been working tirelessly on Metaxy for the past few months, investing a staggering amount of time into it both during and outside of work. It now powers our data preparation pipelines and has made life significantly easier for our research team. docs: docs.metaxy.io blog post: anam.ai/blog/metaxy

1
8
589
Anam now has a Python SDK github.com/anam-org/python-s… What's in the box? - webrtc media handling; connect and get synced audio/video frames back - full pipeline (STT → LLM → TTS → Face) or bring your own components - live transcriptions from user and avatar, useful for captions or logging - async-first; process frames with async iterators, hook into events with decorators This brings it close to feature-parity with our JS SDK. Best thing though is you don't need a browser anymore...
2
5
589