Soumith Chintala

Soumith Chintala

1 Photos and videos

Tweets

Myle Ott retweeted

Soumith Chintala

@soumithchintala

May 12

Cluster magicians and GPU whisperers, come join us! We’re looking for supercomputing engineers to build the infrastructure behind real-time interactive models, Tinker, and large-scale training: scheduling, storage, networking, reliability, and distributed systems at scale. Hiring in NYC and SF job-boards.greenhouse.io/thi…

Software Engineer, Supercomputing

San Francisco

job-boards.greenhouse.io

605

61,052

Thinking Machines

Myle Ott retweeted

Thinking Machines

@thinkymachines

May 11

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/int…

2:15

464

1,961

15,789

7,752,635

Thinking Machines

Myle Ott retweeted

Thinking Machines

@thinkymachines

27 Oct 2025

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-…

406

2,786

1,921,155

Myle Ott

Myle Ott @myleott

1 Oct 2025

So excited about this! Tinker provides a simple powerful interface for postraining/RL research. It also manages all the infrastructure so that users can focus on data and environments. Hidden behind that simple interface is a ton of interesting and complex ML systems challenges! In addition to the work building an efficient RL stack (orchestration, numerics, parallelism, weight transfer, etc.), we also tackled a bunch of new challenges (transparent failure recovery, multi-tenant scheduling, autoscaling, etc.). I had a lot of fun working on early parts of this system and am excited to see what others are able to build with it!

Thinking Machines

@thinkymachines

1 Oct 2025

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

163

60,086

Thinking Machines

Myle Ott retweeted

Thinking Machines

@thinkymachines

29 Sep 2025

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lor…

556

3,483

1,449,484

Thinking Machines

Myle Ott retweeted

Thinking Machines

@thinkymachines

26 Sep 2025

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices. thinkingmachines.ai/blog/mod… We explore a fundamental understanding of the geometry of neural network optimization.

110

433

2,907

1,529,246

Woosuk Kwon

Myle Ott retweeted

Woosuk Kwon

@woosuk_k

11 Sep 2025

At Thinking Machines, our work includes collaborating with the broader research community. Today we are excited to share that we are building a vLLM team at @thinkymachines to advance open-source vLLM and serve frontier models. If you are interested, please DM me or @barret_zoph! Here are some example roles / projects: * Distributed inference engineer to support large-scale models on Blackwell GPUs * PyTorch & model optimization engineer to support & optimize latest OSS models * MLSys generalist for various aspects of vLLM

1,160

194,819

Thinking Machines

Myle Ott retweeted

Thinking Machines

@thinkymachines

10 Sep 2025

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/def…

230

1,244

7,605

3,490,143

Mira Murati

Myle Ott retweeted

Mira Murati

@miramurati

15 Jul 2025

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're excited that in the next couple months we’ll be able to share our first product, which will include a significant open source component and be useful for researchers and startups developing custom models. Soon, we’ll also share our best science to help the research community better understand frontier AI systems. To accelerate our progress, we’re happy to confirm that we’ve raised $2B led by a16z with participation from NVIDIA, Accel, ServiceNow, CISCO, AMD, Jane Street and more who share our mission. We’re always looking for extraordinary talent that learns by doing, turning research into useful things. We believe AI should serve as an extension of individual agency and, in the spirit of freedom, be distributed as widely and equitably as possible. We hope this vision resonates with those who share our commitment to advancing the field. If so, join us. thinkingmachines.paperform.c…

634

670

7,654

2,351,049

Mira Murati

Myle Ott retweeted

Mira Murati

@miramurati

18 Feb 2025

I started Thinking Machines Lab alongside a remarkable team of scientists, engineers, and builders. We're building three things: - Helping people adapt AI systems to work for their specific needs - Developing strong foundations to build more capable AI systems - Fostering a culture of open science that helps the whole field understand and improve these systems Our goal is simple, advance AI by making it broadly useful and understandable through solid foundations, open science, and practical applications. thinkingmachines.ai/

Thinking Machines Lab

Connectionism: Research Blog by Thinking Machines Lab

thinkingmachines.ai

680

883

9,380

1,131,569

Myle Ott

Myle Ott @myleott

1 Aug 2024

Great work by @groeneyy on Prompt Poet! This tool has revolutionized prompt management at @character_ai, simplifying complex prompts and making prompt design more intuitive, scalable and accessible. Check it out!

Character.AI

@character_ai

1 Aug 2024

Thrilled to share that we're open sourcing our innovative approach to prompt design! Discover how Prompt Poet is revolutionizing the way we build AI interactions in our latest blog post: research.character.ai/prompt…

4,745

Myle Ott

Myle Ott @myleott

20 Jun 2024

Excited to share some details of our work. Kudos to @LiangBowen, @sam_shleifer and others at Character for the awesome work optimizing our inference stack!

Noam Shazeer

@NoamShazeer

20 Jun 2024

Character AI is serving 20,000 QPS. Here are the technologies we use to serve hyper-efficiently. [research.character.ai/optimi… ]

3,469

Irwan Bello

Myle Ott retweeted

Irwan Bello

@IrwanBello

5 Dec 2022

For example, we wrote our own high-performance distributed transformer implementation and were able to hit 250 TFLOPs/s on A100s, or ~80% model flops utilization. For comparison, MFU is reported at 54% for Megatron-LM and similar for MosaicML

238

Lucas Caccia

Myle Ott retweeted

Lucas Caccia @LucasPCaccia

10 Jun 2022

🚨 Paper Alert 🚨 We explore and formalize the Anytime Learning at MAcroscale (ALMA) setting, where learners sequentially receive large data dumps over time. What new challenges emerge in ALMA ? How can we learn efficiently ? We answer this in our new @CoLLAs_Conf paper!

Susan Zhang

Myle Ott retweeted

Susan Zhang

@suchenzang

3 May 2022

So excited to finally open up access to these models! Couldn't have asked for a better team to do this with: @stephenroller, @NamanGoyal21 (@myleott @sam_shleifer at the start too)!

AI at Meta

@AIatMeta

3 May 2022

Today Meta AI is sharing OPT-175B, the first 175-billion-parameter language model to be made available to the broader AI research community. OPT-175B can generate creative text on a vast range of topics. Learn more & request access: ai.facebook.com/blog/democra…

0:26

538

PyTorch

Myle Ott retweeted

PyTorch

@PyTorch

15 Mar 2022

PyTorch 1.11 offers native support for FullyShardedDataParallel training of models with up to 1 trillion parameters. It does this by sharding the model across parallel processors, rather than being limited to a single GPU. pytorch.org/blog/introducing…

231

Hugging Face

Myle Ott retweeted

Hugging Face

@huggingface

15 Feb 2022

Few-shot learning beyond English 🌎 XGLM from @MetaAI is now available in Transformers. XGLM is a family of large-scale multilingual autoregressive language models which gives SoTA results on multilingual few-shot learning. Try it now on Spaces 👇 huggingface.co/spaces/valhal…

183

Mikel Artetxe

Myle Ott retweeted

Mikel Artetxe

@artetxem

21 Dec 2021

We are releasing a family of dense and MoE language models with up to 13B and 1.1T parameters. We find that MoEs are more efficient, but the gap narrows at scale and varies greatly across domains and tasks. Paper: arxiv.org/abs/2112.10684 Models & code: github.com/pytorch/fairseq/t…

Xian Li

Myle Ott retweeted

Xian Li

@xl_nlp

21 Dec 2021

🌍Few-shot learning beyond English🌏 📢 Announcing XGLMs, a series of multilingual autoregressive languages models setting new SoTA on few-shot learning and outperforming English-centric models (e.g. GPT-3). Paper: arxiv.org/abs/2112.10668 Models and code: github.com/pytorch/fairseq/t…

216

fairseq

Myle Ott retweeted

fairseq @fairseq

23 Nov 2021

Mixture of experts training in fairseq is now 40% faster thanks to Microsoft's Tutel library! Blog: microsoft.com/en-us/research… Fairseq code: github.com/pytorch/fairseq/t… Tutel code: github.com/microsoft/tutel