Today we’re releasing Chroma 1.0
→ the world first open-source, end-to-end, real-time speech-to-speech model
→ with personalized voice cloning
Trained by FlashLabs.
Deployed on FlashAI👉
flashlabs.ai/flashai-voice-a…
An open research-grade alternative to the
@OpenAI Realtime model.
Voice Test dubbing
@elonmusk and
@lexfridman:
youtube.com/watch?v=AOMmxTws…
🔥What’s real (evals and benchmarks attached):
⚡ <150ms TTFT (end-to-end)
🎙️ Native speech-to-speech (no ASR → LLM → TTS pipeline)
🧬 Few-second reference → high-fidelity voice cloning
📈 SIM = 0.817
→ 10.96% vs human baseline (0.73)
→ Best among open & closed baselines
🧠 Strong reasoning & dialogue with just 4B params (
@Alibaba_Qwen 2.5-Omni-3B, Llama 3, and Mimi)
🔓 Fully open-source (code weights)
With SGLang
@lmsysorg enabled:
• 🧠 Thinker TTFT ↓ ~15%
• ⏱️ End-to-end TTFT ~135ms
• 🔊 RTF ≈ 0.47–0.51 ( >2× faster than real-time )
📘 SGLang Cookbook:
cookbook.sglang.io/docs/auto…
📄 Paper benchmarks:
arxiv.org/abs/2601.11141
🤗 Models:
huggingface.co/FlashLabs/Chr…
💻 Inference code:
github.com/FlashLabs-AI-Corp…
🔁 RT if you believe real-time AI should be open
💬 Reply if you’re building on Conversational Voice AI products