MSc student at Tel-Aviv University working on ML/DL

Joined January 2022
5 Photos and videos
I’ve only been at @cartesia for just over a month, but seeing all the talented people here up close, it’s really not surprising they’re shipping state-of-the-art text-to-speech and speech-to-text models like Sonic 3.5 and Ink 2. Huge congrats to the team, excited to be here :)
We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.
3
30
3,061
Really cool to seeing follow up work on SPT! I've always been curious why optimizing seemingly local objectives (like MLM) lead to gains on long context tasks. @orvieto_antonio & @CoserOmar provide valuable insights on that and more. Enojyed reading the paper, awesome work!
I have long been fascinated by the "Never Train from Scratch" (arxiv.org/abs/2310.02980, ICLR24 outstanding paper) results by @AmosaurusRex and collaborators. Finally, with @CoserOmar we got time to look into the mechanisms of self-pretraining (SPT). Here's what we learned 🧵
1
2
9
1,911
Just recently joined @cartesia and looks like my timing was pretty good 😬
Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500 voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 ( 16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵
2
20
1,404
Can LLMs reason internally while processing their inputs, similar to how humans can think ahead as we process information? Our latest work introduces Thinking States, a novel architectural adaptation that transforms reasoning into a internal recurrent process. By training models to maintain a dynamic thinking state, we achieve significant inference speedups over Chain-of-Thought while substantially outperforming existing latent reasoning methods. Paper: arxiv.org/abs/2602.08332
5
27
132
12,818
A major challenge in latent reasoning is finding effective supervision for the reasoning process. Since thinking states are represented in natural language, we can leverage existing Chain-of-Thought data for supervision. Furthermore, as this supervision is available in advance, we use it to teacher-force the thinking states themselves. This circumvents the need for costly recurrent optimization via backpropagation through time (BPTT), enabling fully parallel training and maintaining nearly constant training costs regardless of reasoning depth.
1
3
525
Thinking States outperforms existing latent reasoning methods on multiple benchmarks and matches Chain-of-Thought performance on multi-hop QA, while leading to faster inference times. Furthermore, Thinking states exhibit superior length generalization in state-tracking tasks, successfully extrapolating to sequences significantly longer than those seen during training. This work was done during an internship at Google Research with an incredible team of collaborators: @clu_avi @megamor2 @amirgloberson @jonherzig @LiorShani286867 @ISzpektor Read the full paper and explore our findings here: arxiv.org/abs/2602.08332
8
456
Honestly cannot believe that our work got the BEST PAPER award @iclr_conf !!! This was an amazing experience with my collaborators @JonathanBerant @ankgup2 , looking forward to share with everyone at the conference. Reach out if you want to chat!
Excited to share my work with @JonathanBerant @ankgup2! We show pretraining on task data alone suffices to bridge the gap between state space models and transformers on Long Range Arena, leading to a significantly better estimate of model capabilities. arxiv.org/abs/2310.02980 🧵
2
1
36
4,586
Excited to share my work with @JonathanBerant @ankgup2! We show pretraining on task data alone suffices to bridge the gap between state space models and transformers on Long Range Arena, leading to a significantly better estimate of model capabilities. arxiv.org/abs/2310.02980 🧵
1
8
39
9,411
[3/4] The marked effect of self-pretraining on long-sequence tasks leads us to rethink the necessity of complex designs, with Diagonal Linear RNNs (DLR) as a specific example. Our findings indicate that, when pretrained, simple architectures can be as effective as complex designs
1
2
6
664
[4/4] Investigating the effects of data scale, we find self-pretraining is most effective in low-data regimes, underscoring its importance for evaluation across all dataset sizes. We further show that self pretraining is effective across model sizes and when compute is limited.
2
4
535