Ido Amos

Ido Amos

5 Photos and videos

Tweets

Ido Amos @AmosaurusRex

16h

I’ve only been at @cartesia for just over a month, but seeing all the talented people here up close, it’s really not surprising they’re shipping state-of-the-art text-to-speech and speech-to-text models like Sonic 3.5 and Ink 2. Huge congrats to the team, excited to be here :)

Karan Goel

@krandiash

16h

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

2:48

3,061

Ido Amos

Ido Amos @AmosaurusRex

Jun 7

Really cool to seeing follow up work on SPT! I've always been curious why optimizing seemingly local objectives (like MLM) lead to gains on long context tasks. @orvieto_antonio & @CoserOmar provide valuable insights on that and more. Enojyed reading the paper, awesome work!

Antonio Orvieto

@orvieto_antonio

Jun 5

I have long been fascinated by the "Never Train from Scratch" (arxiv.org/abs/2310.02980, ICLR24 outstanding paper) results by @AmosaurusRex and collaborators. Finally, with @CoserOmar we got time to look into the mechanisms of self-pretraining (SPT). Here's what we learned 🧵

1,911

Ido Amos

Ido Amos @AmosaurusRex

May 22

Just recently joined @cartesia and looks like my timing was pretty good 😬

Artificial Analysis

@ArtificialAnlys

May 22

Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500 voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 ( 16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵

1,404

Ido Amos

Ido Amos @AmosaurusRex

Feb 17

Can LLMs reason internally while processing their inputs, similar to how humans can think ahead as we process information? Our latest work introduces Thinking States, a novel architectural adaptation that transforms reasoning into a internal recurrent process. By training models to maintain a dynamic thinking state, we achieve significant inference speedups over Chain-of-Thought while substantially outperforming existing latent reasoning methods. Paper: arxiv.org/abs/2602.08332

132

12,818

more replies

Ido Amos

Ido Amos @AmosaurusRex

Feb 17

A major challenge in latent reasoning is finding effective supervision for the reasoning process. Since thinking states are represented in natural language, we can leverage existing Chain-of-Thought data for supervision. Furthermore, as this supervision is available in advance, we use it to teacher-force the thinking states themselves. This circumvents the need for costly recurrent optimization via backpropagation through time (BPTT), enabling fully parallel training and maintaining nearly constant training costs regardless of reasoning depth.

525

Ido Amos

Ido Amos @AmosaurusRex

Feb 17

Thinking States outperforms existing latent reasoning methods on multiple benchmarks and matches Chain-of-Thought performance on multi-hop QA, while leading to faster inference times. Furthermore, Thinking states exhibit superior length generalization in state-tracking tasks, successfully extrapolating to sequences significantly longer than those seen during training. This work was done during an internship at Google Research with an incredible team of collaborators: @clu_avi @megamor2 @amirgloberson @jonherzig @LiorShani286867 @ISzpektor Read the full paper and explore our findings here: arxiv.org/abs/2602.08332

Latent Reasoning with Supervised Thinking States

Reasoning with a chain-of-thought (CoT) enables Large Language Models (LLMs) to solve complex tasks but incurs significant inference costs due to the generation of long rationales. We propose...

arxiv.org

456

Ido Amos

Ido Amos @AmosaurusRex

8 May 2024

Honestly cannot believe that our work got the BEST PAPER award @iclr_conf !!! This was an amazing experience with my collaborators @JonathanBerant @ankgup2 , looking forward to share with everyone at the conference. Reach out if you want to chat!

Ido Amos @AmosaurusRex

5 Dec 2023

Excited to share my work with @JonathanBerant @ankgup2! We show pretraining on task data alone suffices to bridge the gap between state space models and transformers on Long Range Arena, leading to a significantly better estimate of model capabilities. arxiv.org/abs/2310.02980 🧵

4,586

Ido Amos

Ido Amos @AmosaurusRex

5 Dec 2023

Never Train from Scratch: Fair Comparison of Long-Sequence Models...

Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on...

arxiv.org

9,411

more replies

Ido Amos

Ido Amos @AmosaurusRex

5 Dec 2023

[3/4] The marked effect of self-pretraining on long-sequence tasks leads us to rethink the necessity of complex designs, with Diagonal Linear RNNs (DLR) as a specific example. Our findings indicate that, when pretrained, simple architectures can be as effective as complex designs

664

Ido Amos

Ido Amos @AmosaurusRex

5 Dec 2023

[4/4] Investigating the effects of data scale, we find self-pretraining is most effective in low-data regimes, underscoring its importance for evaluation across all dataset sizes. We further show that self pretraining is effective across model sizes and when compute is limited.

535