Joined September 2009
1 Photos and videos
Myle Ott retweeted
Cluster magicians and GPU whisperers, come join us! We’re looking for supercomputing engineers to build the infrastructure behind real-time interactive models, Tinker, and large-scale training: scheduling, storage, networking, reliability, and distributed systems at scale. Hiring in NYC and SF job-boards.greenhouse.io/thi…
29
34
605
61,052
Myle Ott retweeted
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/int…
464
1,961
15,789
7,752,635
Myle Ott retweeted
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-…
60
406
2,786
1,921,155
1 Oct 2025
So excited about this! Tinker provides a simple powerful interface for postraining/RL research. It also manages all the infrastructure so that users can focus on data and environments. Hidden behind that simple interface is a ton of interesting and complex ML systems challenges! In addition to the work building an efficient RL stack (orchestration, numerics, parallelism, weight transfer, etc.), we also tackled a bunch of new challenges (transparent failure recovery, multi-tenant scheduling, autoscaling, etc.). I had a lot of fun working on early parts of this system and am excited to see what others are able to build with it!
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker
5
12
163
60,086
Myle Ott retweeted
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lor…
82
556
3,483
1,449,484
Myle Ott retweeted
Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices. thinkingmachines.ai/blog/mod… We explore a fundamental understanding of the geometry of neural network optimization.
110
433
2,907
1,529,246
Myle Ott retweeted
11 Sep 2025
At Thinking Machines, our work includes collaborating with the broader research community. Today we are excited to share that we are building a vLLM team at @thinkymachines to advance open-source vLLM and serve frontier models. If you are interested, please DM me or @barret_zoph! Here are some example roles / projects: * Distributed inference engineer to support large-scale models on Blackwell GPUs * PyTorch & model optimization engineer to support & optimize latest OSS models * MLSys generalist for various aspects of vLLM
41
80
1,160
194,819
Myle Ott retweeted
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/def…
230
1,244
7,605
3,490,143
Myle Ott retweeted
15 Jul 2025
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're excited that in the next couple months we’ll be able to share our first product, which will include a significant open source component and be useful for researchers and startups developing custom models. Soon, we’ll also share our best science to help the research community better understand frontier AI systems. To accelerate our progress, we’re happy to confirm that we’ve raised $2B led by a16z with participation from NVIDIA, Accel, ServiceNow, CISCO, AMD, Jane Street and more who share our mission. We’re always looking for extraordinary talent that learns by doing, turning research into useful things. We believe AI should serve as an extension of individual agency and, in the spirit of freedom, be distributed as widely and equitably as possible.  We hope this vision resonates with those who share our commitment to advancing the field. If so, join us. thinkingmachines.paperform.c…
634
670
7,654
2,351,049
Myle Ott retweeted
18 Feb 2025
I started Thinking Machines Lab alongside a remarkable team of scientists, engineers, and builders. We're building three things: - Helping people adapt AI systems to work for their specific needs - Developing strong foundations to build more capable AI systems - Fostering a culture of open science that helps the whole field understand and improve these systems Our goal is simple, advance AI by making it broadly useful and understandable through solid foundations, open science, and practical applications. thinkingmachines.ai/
680
883
9,380
1,131,569
1 Aug 2024
Great work by @groeneyy on Prompt Poet! This tool has revolutionized prompt management at @character_ai, simplifying complex prompts and making prompt design more intuitive, scalable and accessible. Check it out!
Thrilled to share that we're open sourcing our innovative approach to prompt design! Discover how Prompt Poet is revolutionizing the way we build AI interactions in our latest blog post: research.character.ai/prompt…
3
22
4,745
20 Jun 2024
Excited to share some details of our work. Kudos to @LiangBowen, @sam_shleifer and others at Character for the awesome work optimizing our inference stack!
Character AI is serving 20,000 QPS. Here are the technologies we use to serve hyper-efficiently. [research.character.ai/optimi… ]
2
25
3,469
Myle Ott retweeted
For example, we wrote our own high-performance distributed transformer implementation and were able to hit 250 TFLOPs/s on A100s, or ~80% model flops utilization. For comparison, MFU is reported at 54% for Megatron-LM and similar for MosaicML
9
13
238
Myle Ott retweeted
🚨 Paper Alert 🚨 We explore and formalize the Anytime Learning at MAcroscale (ALMA) setting, where learners sequentially receive large data dumps over time. What new challenges emerge in ALMA ? How can we learn efficiently ? We answer this in our new @CoLLAs_Conf paper!
1
9
29
Myle Ott retweeted
So excited to finally open up access to these models! Couldn't have asked for a better team to do this with: @stephenroller, @NamanGoyal21 (@myleott @sam_shleifer at the start too)!
3 May 2022
Today Meta AI is sharing OPT-175B, the first 175-billion-parameter language model to be made available to the broader AI research community. OPT-175B can generate creative text on a vast range of topics. Learn more & request access: ai.facebook.com/blog/democra…
16
85
538
Myle Ott retweeted
15 Mar 2022
PyTorch 1.11 offers native support for FullyShardedDataParallel training of models with up to 1 trillion parameters. It does this by sharding the model across parallel processors, rather than being limited to a single GPU. pytorch.org/blog/introducing…
4
42
231
Myle Ott retweeted
Few-shot learning beyond English 🌎 XGLM from @MetaAI is now available in Transformers. XGLM is a family of large-scale multilingual autoregressive language models which gives SoTA results on multilingual few-shot learning. Try it now on Spaces 👇 huggingface.co/spaces/valhal…
4
40
183
Myle Ott retweeted
21 Dec 2021
We are releasing a family of dense and MoE language models with up to 13B and 1.1T parameters. We find that MoEs are more efficient, but the gap narrows at scale and varies greatly across domains and tasks. Paper: arxiv.org/abs/2112.10684 Models & code: github.com/pytorch/fairseq/t…
4
24
93
Myle Ott retweeted
21 Dec 2021
🌍Few-shot learning beyond English🌏 📢 Announcing XGLMs, a series of multilingual autoregressive languages models setting new SoTA on few-shot learning and outperforming English-centric models (e.g. GPT-3). Paper: arxiv.org/abs/2112.10668 Models and code: github.com/pytorch/fairseq/t…
2
53
216
Myle Ott retweeted
23 Nov 2021
Mixture of experts training in fairseq is now 40% faster thanks to Microsoft's Tutel library! Blog: microsoft.com/en-us/research… Fairseq code: github.com/pytorch/fairseq/t… Tutel code: github.com/microsoft/tutel
3
17