Mitchell Wortsman

Mitchell Wortsman

51 Photos and videos

Tweets

Pinned Tweet

Mitchell Wortsman @Mitchnw

28 Sep 2023

Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: arxiv.org/abs/2309.14322 With fantastic collaborators @peterjliu, @Locchiu, @_katieeverett, many others (see final tweet!), @hoonkp, @jmgilmer, @skornblith! (1/15)

340

100,508

Mike A. Merrill

Mitchell Wortsman retweeted

Mike A. Merrill

@Mike_A_Merrill

10 Dec 2025

New job! I’m hiring folks interested in building and researching the next generation of evals and eval infa. DMs are open :)

112

2,212

144,890

Ludwig Schmidt

Mitchell Wortsman retweeted

Ludwig Schmidt @lschmidt3

5 Jun 2025

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

206

1,324

187,522

Anthropic

Mitchell Wortsman retweeted

Anthropic

@AnthropicAI

22 May 2025

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

A benchmarking table titled Claude 4 benchmarks comparing performance metrics across various capabilities including coding, reasoning, tool use, multilingual Q&A, visual reasoning, and mathematics.

ALT A benchmarking table titled Claude 4 benchmarks comparing performance metrics across various capabilities including coding, reasoning, tool use, multilingual Q&A, visual reasoning, and mathematics.

928

3,153

20,650

4,286,571

Cade Gordon

Mitchell Wortsman retweeted

Cade Gordon

@CadeGordonML

21 May 2025

Excited to share that I'll be joining @Anthropic to work on pretraining science! I've chosen to defer my Stanford PhD, where I'm honored to be supported by the Hertz Fellowship. There's something special about the science, this place, and these people. Looking forward to joining some of my most brilliant and compassionate colleagues!

755

58,821

Mike A. Merrill

Mitchell Wortsman retweeted

Mike A. Merrill

@Mike_A_Merrill

19 May 2025

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr lots of room for improvement! tbench.ai/

243

52,219

Alex Li

Mitchell Wortsman retweeted

Alex Li @alexlioralexli

26 Apr 2025

Excited to be presenting at #ICLR2025 at 10am today on how generative classifiers are much more robust to distribution shift. Come by to chat and say hello!

6,559

Alex Li

Mitchell Wortsman retweeted

Alex Li @alexlioralexli

12 Dec 2024

I'm presenting our #NeurIPS2024 work on Attention Transfer today! Key finding: Pretrained representations aren't essential - just using attention patterns from pretrained models to guide token interactions is enough for models to learn high-quality features from scratch and match ImageNet performance! 🤯 Chat with me and @endernewton Dec 12 (today), 4:30 -7:30 pm PST, East Exhibit Hall #1900

155

14,125

Akari Asai

Mitchell Wortsman retweeted

Akari Asai

@AkariAsai

4 Dec 2024

🚨 I’m on the job market this year! 🚨 I’m completing my @uwcse Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵

ALT Overview of Akari's research. More information is at https://akariasai.github.io/

117

811

126,777

Ofir Press

Mitchell Wortsman retweeted

Ofir Press

@OfirPress

4 Dec 2024

I'm on the academic job market! I develop autonomous systems for: programming, research-level question answering, finding sec vulnerabilities & other useful challenging tasks. I do this by building frontier-pushing benchmarks and agents that do well on them. See you at NeurIPS!

230

24,007

Anthropic

Mitchell Wortsman retweeted

Anthropic

@AnthropicAI

22 Oct 2024

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

A benchmark comparison table showing performance metrics for multiple AI models including Claude 3.5 Sonnet (new), Claude 3.5 Haiku, GPT-4o, and Gemini models across different tasks.

ALT A benchmark comparison table showing performance metrics for multiple AI models including Claude 3.5 Sonnet (new), Claude 3.5 Haiku, GPT-4o, and Gemini models across different tasks.

467

1,826

9,989

3,698,983

Ross Wightman

Mitchell Wortsman retweeted

Ross Wightman

@wightmanr

17 Oct 2024

OpenCLIP passed 10K stars on GitHub this week. A big milestone for any open-source project. 🍻 to the many collaborators that made that possible. Coincidentally, I pushed a new release with a port of the largest multi-lingual SigLIP -- a SO400M/16 @ 256x256 that appeared on big_vision a little while back. Now on the @huggingface hub and useable via timm or OpenCLIP (update your timm too)! huggingface.co/timm/ViT-SO40…

147

26,228

Katie Everett

Mitchell Wortsman retweeted

Katie Everett @_katieeverett

23 Jul 2024

Come chat with me and @Locchiu at our ICML poster session 1:30-3pm CEST (Vienna time) today at Hall C 4-9 #2500 and see how our theory lets all parameterizations perform hyperparameter transfer! arxiv.org/abs/2407.05872

65,737

Vaishaal Shankar

Mitchell Wortsman retweeted

Vaishaal Shankar @Vaishaal

18 Jul 2024

We have released our DCLM models on huggingface! To our knowledge these are by far the best performing truly open-source models (open data, open weight models, open training code) 1/5

285

51,222

Katie Everett

Mitchell Wortsman retweeted

Katie Everett @_katieeverett

18 Jul 2024

We've gotten some great questions about the notion of alignment in our width-scaling parameterization paper! arxiv.org/abs/2407.05872 A deep dive into the alignment metric and intuition 🧵 [1/16]

14,818

Tomer Porian

Mitchell Wortsman retweeted

Tomer Porian @tomerporian

2 Jul 2024

🧵1/8 We resolve the discrepancy between the compute optimal scaling laws of Kaplan (exponent 0.88, Figure 14, left) et al. and Hoffmann et al. (“Chinchilla”, exponent 0.5). Paper: arxiv.org/abs/2406.19146 Data Code: github.com/formll/resolving-…

170

36,040

Anthropic

Mitchell Wortsman retweeted

Anthropic

@AnthropicAI

20 Jun 2024

We're also launching a preview of Artifacts on claude.ai. You can ask Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games. Artifacts appear next to your chat, letting you see, iterate, and build on your creations in real-time.

1:17

188

1,630

588,443

Anthropic

Mitchell Wortsman retweeted

Anthropic

@AnthropicAI

20 Jun 2024

Introducing Claude 3.5 Sonnet—our most intelligent model yet. This is the first release in our 3.5 model family. Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost. Try it for free: claude.ai

Benchmark table showing Claude 3.5 Sonnet outperforming (as indicated by green highlights) other AI models on graduate level reasoning, code, multilingual math, reasoning over text, and more evaluations. Models compared include Claude 3 Opus, GPT-4o, Gemini 1.5 Pro, and Llama-400b.

ALT Benchmark table showing Claude 3.5 Sonnet outperforming (as indicated by green highlights) other AI models on graduate level reasoning, code, multilingual math, reasoning over text, and more evaluations. Models compared include Claude 3 Opus, GPT-4o, Gemini 1.5 Pro, and Llama-400b.

419

1,531

7,040

2,522,663

Josh Gardner

Mitchell Wortsman retweeted

Josh Gardner @jpgard

19 Jun 2024

Thrilled to share our paper “Large-Scale Transfer Learning for Tabular Data via Language Modeling,” introducing TabuLa-8B: a foundation model for prediction on tabular data. (with Juan C Perdomo @lschmidt3) 📖 arxiv.org/abs/2406.12031 🌐 huggingface.co/collections/m… [long🧵]

Large Scale Transfer Learning for Tabular Data via Language Modeling

Tabular data -- structured, heterogeneous, spreadsheet-style data with rows and columns -- is widely used in practice across many domains. However, while recent foundation models have reduced the...

arxiv.org

5,335

Vaishaal Shankar

Mitchell Wortsman retweeted

Vaishaal Shankar @Vaishaal

18 Jun 2024

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

273

120,142

Peter J. Liu

Mitchell Wortsman retweeted

Peter J. Liu

@peterjliu

5 Jun 2024

We recently open-sourced a relatively minimal implementation example of Transformer language model training in JAX, called NanoDO. If you stick to vanilla JAX components, the code is relatively straightforward to read -- the model file is <150 lines. We found it useful as a fork-able example for researchers to easily hack on and experiment rapidly. While we do not provide SOTA configs, we hope it can help researchers get started to try ideas, especially if new to LMs and JAX -- which is truly under-rated. Repo: github.com/google-deepmind/n…

278

58,921