Particle physics and astrophysics -» predicting ad clicks at Yandex -» spoken language understanding -» LLM-powered NPCs at Inworld.ai

Joined May 2014
3 Photos and videos
Pavel Shtykovskiy retweeted
Many roughly know how a transformer works To REALLY understand modern neural LMs—MoEs, GPU tiling, kernels, RLHF, data—you need CS336 By @tatsu_hashimoto, @percyliang The 2026 edition appears on yt with ~2 weeks delay youtube.com/playlist?list=PL… Materials cs336.stanford.edu/
12
219
1,745
295,279
Pavel Shtykovskiy retweeted
Introducing Realtime TTS-2, a new generation of voice model built for realtime conversation. It is the first voice model that hears the conversation, takes natural-language voice direction, holds one voice identity across over 100 languages, and speaks like a person who is paying attention. The result is voice AI that feels as good as it sounds. Try it out: tinyurl.com/RealtimeAI Learn More: tinyurl.com/TTS-2Blog
106
163
783
323,478
Pavel Shtykovskiy retweeted
Inworld TTS-1.5 releases today. The #1 TTS on Artificial Analysis now offers realtime latency under 250ms and optimized expression and stability for user engagement, and costs half a cent per minute. Some voice models are fast, some are expressive, some are affordable. We outperform them all across the board. Production-grade realtime latency: <250ms latency for Max model, <130ms for Mini (P90 first audio) - 4x faster than before. Voice agents now respond before users notice any delay. Engagement-optimized quality: 30% more expressive to serve a wider range of personalities and 40% lower word error rates for fewer hallucinations, word cutoffs, and audio artifacts. Built for consumer-scale: Radically affordable with enhanced multilingual support (15 languages including Hindi) and enhanced voice cloning, now via API. On-prem options now available for enterprises.
55
105
490
285,655
Pavel Shtykovskiy retweeted
6 Nov 2025
Our TTS Max model just debuted at #1 on the @ArtificialAnlys leaderboard. And at $10/million characters, it’s also the most cost-efficient commercial TTS model available. Excited to keep making state-of-the-art voice more accessible. Check it out at inworld.ai/tts or through our partners @pipecat_ai and @livekit.
Inworld TTS 1 Max is the new leader on the Artificial Analysis Speech Arena Leaderboard, surpassing MiniMax’s Speech-02 series and OpenAI’s TTS-1 series The Artificial Analysis Speech Arena ranks leading Text to Speech models based on human preferences. In the arena, users compare two pieces of generated speech side by side and select their preferred output without knowing which models created them. The speech arena includes prompts across four real-world categories of prompts: Customer Service, Knowledge Sharing, Digital Assistants, and Entertainment. Inworld TTS 1 Max and Inworld TTS 1 both support 12 languages including English, Spanish, French, Korean, and Chinese, and voice cloning from 2-15 seconds of audio. Inworld TTS 1 processes ~153 characters per second of generation time on average, with the larger model, Inworld TTS 1 Max processing ~69 characters on average. Both models also support voice tags, allowing users to add emotion, delivery style, and non-verbal sounds, such as “whispering”, “cough”, and “surprised”. Both TTS-1 and TTS-1-Max are transformer-based, autoregressive models employing LLaMA-3.2-1B and LLaMA-3.1-8B respectively as their SpeechLM backbones. See the leading models in the Speech Arena, and listen to sample clips below 🎧
11
17
139
16,169
Pavel Shtykovskiy retweeted
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!
53
495
2,366
857,531
Pavel Shtykovskiy retweeted
Just signed a book deal for The RLHF Book, excited to make improvements to it this fall and get physical copies in your hands soon :) (rlhfbook dot com)
31
18
455
47,149
Pavel Shtykovskiy retweeted
1 43.6 Grok-4-Wiz-AI-Cha died in The Dungeons of Doom on level 1. Killed by a housecat.
LLMs acing math olympiads? Cute. But BALROG is where agents fight dragons (and actual Balrogs)🐉😈 And today, Grok-4 (@grok) takes the gold 🥇 Welcome to the podium, champion!
1
5
25
4,499
Pavel Shtykovskiy retweeted
I am happy to announce that the first draft of my RL tutorial is now available. arxiv.org/abs/2412.05265
72
722
4,393
320,760
Pavel Shtykovskiy retweeted
Earlier, we with @framrus developed a humor generation method that gives human-level results on blind tests. Now, we with @SaveTheRbtz are launching HUMOR-ARENA (humor.ph34r.me/), generated humor labeling site with the models ranking, and the top of generated jokes. Blog-post: altsoph.medium.com/humor-are…
1
4
9
2,321
Pavel Shtykovskiy retweeted
Want to learn about meta-learning & few-shot learning? All of the latest lecture videos for Stanford CS330 are now online! youtube.com/playlist?list=PL… New topics in Fall '22 include: - self-supervised pre-training - large scale meta-optimization - domain adaptation & generalization
17
184
915
150,905
Pavel Shtykovskiy retweeted
You can now watch the recorded material from #NeurIPS2022 online without registration at: slideslive.com/neurips-2022
8
213
771
140,501
Pavel Shtykovskiy retweeted
Our 2021 CS330 (cs330.stanford.edu/fall2021) lectures are online: youtube.com/playlist?list=PL… It was a pleasure to co-teach this class with @chelseabfinn. Topics incl. meta-learning, MTL, few-shot learning, deep RL (incl. multi-task, meta, goal-conditioned, hierarchical and offline RL)

7
75
417
Pavel Shtykovskiy retweeted
The video of my talk @EPFL_en today on Transformers and how to make sense of them is online! youtube.com/watch?v=brmidghO…

5
65
484
Pavel Shtykovskiy retweeted
Fun read on why MLOps is still somewhat broken -- the engineers who build them are not users. In ML Frameworks, the authors were ML scientists -- (Py)Torch, Theano, Caffe, MXNet, Keras, Chainer, TF, etc. and that helped in design requirements accurately being in your head.
Bananas and ML infrastructure: I've asked around about cloud workflows, and most of the feedback had unhappiness with cloud tooling. This prompted a discussion in @chipro's MLops community -- why are MLops frameworks so bad? (1/9)
10
37
250
Pavel Shtykovskiy retweeted
Bananas and ML infrastructure: I've asked around about cloud workflows, and most of the feedback had unhappiness with cloud tooling. This prompted a discussion in @chipro's MLops community -- why are MLops frameworks so bad? (1/9)
10
62
366
Nice blog post on distributed multi-GPU training of large models lilianweng.github.io/lil-log…

Pavel Shtykovskiy retweeted
Today the videos that I made to accompany my book Linear Algebra Done Right surpassed two million minutes of total viewing on YouTube. Those videos are freely available from the links at linear.axler.net/LADRvideos.…. #LinearAlgebra
34
494
2,713
Pavel Shtykovskiy retweeted
Just watched an incredible talk by @AlexGDimakis at the Simons Institute, highly recommended. Their Iterative Layer Optimization technique to solve inverse problems with GANs make a LOT of sense! The empirical results on the famous blurred Obama face speak for themselves! 1/4
3
75
441
Inductive Biases for Deep Learning of Higher-Level Cognition (arxiv.org/abs/2011.15091) Fantastic paper!

1
1