Vin Howe

Vin Howe

11 Photos and videos

Tweets

Vin Howe @vinhowe

May 20

Preprint 🧵! How compartmentalized are LLMs? For data in different formats (English/Chinese, Wiki/Q&A), how much transfer occurs? We provide evidence that LLMs can struggle with this sort of transfer, with consequences like sample inefficiency and capacity competition.

2,240

more replies

Vin Howe

Vin Howe @vinhowe

May 20

We build on existing work showing that frontier performance on all sorts of transfer is more inconsistent than we might hope, especially after learning from trillions of tokens: x.com/NitCal/status/20263003… @NitCal x.com/omerNLP/status/1907058… @omerNLP arxiv.org/abs/2408.10646 @LChoshen

omer goldman @omerNLP

1 Apr 2025

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯

411

Vin Howe

Vin Howe @vinhowe

May 20

Preprint link: arxiv.org/abs/2605.19284 Super fun project. I'll be a fellow at @MATSProgram in Berkeley next month. Reach out!

Language models struggle with compartmentalization

In the training data used by large language models (LLMs), the same latent concept is often presented in multiple distinct ways: the same facts appear in English and Swahili; many functions can be...

arxiv.org

148

Josh Greaves

Vin Howe retweeted

Josh Greaves

@joshgreaves_ml

Jan 29

The big labs are betting RL will unlock superhuman coding. But their infrastructure is closed, and OSS tooling doesn't support true online RL—just iterative batch optimization. We're releasing ARES to close that gap 🧵

Martian

@withmartian

Jan 29

Announcing ARES - our open-source Agentic Research and Evaluation Suite. ARES is built around 3 pillars (👇 see the thread) to make reinforcement learning for code agents easy. We’ve also found it to be incredibly useful for our own mech interp research.

0:04

222

37,962

Vin Howe

Vin Howe @vinhowe

21 Nov 2025

Train a language model in your browser with WebGPU! I built a playground for training sequence models (Transformers, LSTMs, GRUs, vanilla RNNs) completely in your browser on synthetic tasks like sorting and simple natural language datasets like TinyStories. You can fiddle with 50 experiment knobs to build your own model, which can be as big as you have the VRAM to accommodate. You don't have to install anything—all you need is a browser with WebGPU support. Check it out! Link to repo blog post features and technical details in the reply. 🧵

0:05

2,341

more replies

Vin Howe

Vin Howe @vinhowe

21 Nov 2025

This project was inspired directly by: - @fleetwood___ Ratchet - @willdepue WebGPT - @dsmilkov, @shancarter TensorFlow Neural Network Playground - @kellerjordan0 Modded-NanoGPT and Muon - @xenovacom Transformers.js - @polodataclub Transformer Explainer - @brendanbycroft LLM Visualization - @karpathy ConvNetJS, micrograd, minGPT, llm.c

553

Vin Howe

Vin Howe @vinhowe

21 Nov 2025

Thanks to: - @grantpitt0, who helped create the original idea, provided invaluable feedback, and helped me debug a few cursed numerical bugs. - @fleetwood___ for help with Ratchet (and pushing me to write a blog post). - @bgub_ for helpful feedback. 💜

381

Alex Shaw

Vin Howe retweeted

Alex Shaw

@alexgshaw

19 May 2025

Excited to share what I’ve been working on with @andykonwinski, @Mike_A_Merrill, and @lschmidt3 at Stanford & Laude. Introducing Terminal-Bench! A benchmark and framework to quantify how well AI agents accomplish complex tasks in a terminal environment. We believe that the terminal is a particularly powerful tool for agents because it provides a text-based low-level interface for operating a computer to an agent.

Mike A. Merrill

@Mike_A_Merrill

19 May 2025

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr lots of room for improvement! tbench.ai/

6,159

Andy Konwinski

Vin Howe retweeted

Andy Konwinski

@andykonwinski

12 Dec 2024

I'll give $1M to the first open source AI that gets 90% on this sweet new contamination-free version of SWE-bench - kprize.ai

116

641

102,204

Chris Bail

Vin Howe retweeted

Chris Bail

@chris_bail

20 Mar 2024

My new piece in @HarvardBiz describes our work using AI to perform conflict mediation on social media, and how it inspired a new intervention by NextDoor which resulted in a 15% decrease in toxic content! hbr.org/2024/03/genai-could-…

GenAI Could Make Online Conversations More Civil

Online conversations are famously fraught, which creates challenges for people communicating on online platforms, including those used for workplace collaboration. New research suggests that these...

hbr.org

7,420

Kolby Nottingham

Vin Howe retweeted

Kolby Nottingham @kolbytn

9 Feb 2024

Excited to share our work, "Skill Set Optimization", a continual learning method for LLM actors that: - Automatically extracts modular subgoals to use as skills - Reinforces skills using environment reward - Facilitates skill retrieval based on state allenai.github.io/sso 🧵

15,990