Today, we release our 🇫🇷 to 🇬🇧 simultaneous speech-to-speech translation system, called Hibiki. It runs on-device & the model, inference code and tech report are available. This is built using the same audio LLM as Moshi, showing its versatility. 🟢
Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧.
Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech.
Based on objective and human evaluations, Hibiki outperforms previous systems for quality, naturalness and speaker similarity and approaches human interpreters. 🧵