John Hewitt

John Hewitt

51 Photos and videos

Tweets

Pinned Tweet

John Hewitt @johnhewtt

Apr 29

New paper! Subliminal learning—transferring hidden signals between language models—is more powerful than we thought. By biasing the teacher with a steering vector instead of a prompt, we achieve strong, consistent transfer, which we use to study its mechanisms. w/@GeorgeMorgulis

302

20,034

John Hewitt

John Hewitt @johnhewtt

Apr 29

302

20,034

more replies

John Hewitt

John Hewitt @johnhewtt

Apr 29

This is the first paper from first author George Morgulis (@GeorgeMorgulis), a Columbia masters student. Many congratulations to him for getting this work out!

1,793

John Hewitt

John Hewitt @johnhewtt

Apr 29

I’m most excited about how our subliminal steering enables the consistent study of subliminal learning across models and signals, and enabling future work in detection and mitigation. paper: arxiv.org/pdf/2604.25783 github: github.com/GMorgulis/Sublimi…

1,752

John Hewitt

John Hewitt @johnhewtt

Apr 23

I’m at both of today’s mentoring sessions! Come with questions; leave with either answers or more questions

Yuntian Deng

@yuntiandeng

Apr 23

ICLR office hour / mentoring sessions start today! Walk-ins welcome. ⏰ Apr 23–25 12:00–12:45 and 1:15–2:00 📍 Rooms 206, 209, 212 If you're not sure where to go, just start at Room 206.

3,173

John Hewitt

John Hewitt @johnhewtt

Apr 22

I’m at ICLR this year! Among other things, I’m happy to chat about PhD admissions; I’ll be hiring for my lab this upcoming cycle. Feel free to reach out.

151

14,752

Chenhao Tan

John Hewitt retweeted

Chenhao Tan

@ChenhaoTan

Apr 1

Excited to announce the 2026 iteration of the Communication & Intelligence Symposium at UChicago! We have an amazing lineup of speakers @Diyi_Yang @johnhewtt @dashunwang @TomerUllman We have a simple call for abstract that is due on Apr 15 (links 👇). Please come and share your research! Co-organized with the awesome @universeinanegg and @divingwithorcas

35,131

John Hewitt

John Hewitt @johnhewtt

Mar 4

Lots of interp thought discusses the linearity of the residual stream! This blog post: the residual stream isn't linear in a way that provides formal leverage, and interp methods based on linearity should not be preferred beyond empirical utility. cs.columbia.edu/~johnhew/res…

235

13,079

John Hewitt

John Hewitt @johnhewtt

Feb 26

In a pub trivia night, if you don't know the answer immediately, you "reason" through your memories -- is it X? no... was Y related?. In LMs, we find that code/math RLVR'd models' reasoning for this parametric knowledge access can be easily improved, say, by TriviaQA RLVR.

Melody Ma @MelodyHorsee

Feb 26

(1/8) Reasoning language models are great at math and code – but what about remembering facts stored in their parameters? Excited to share work with @johnhewtt exploring this! TL;DR: we don't usually think of RLVR as useful for knowledge recall from parameters, but it helps a lot.

7,504

John Hewitt

John Hewitt @johnhewtt

Jan 18

Hey folks, just in case it was unclear, I talked to Been and her account has been hacked, so please disregard.

This tweet is unavailable

17,304

John Hewitt

John Hewitt @johnhewtt

9 Dec 2025

Excited to see more funding opportunities for interpretability, with an explicit call for strong evaluations.

Martian

@withmartian

7 Dec 2025

$1,000,000 to understand how LLMs write code. Announcing: The Martian Interpretability Challenge. Understanding the inner workings of LLMs is the greatest scientific challenge of our age,. Let's solve it. Apply here: withmartian.com/prize 🧵👇

6,593

Pratyusha Sharma

John Hewitt retweeted

Pratyusha Sharma @pratyusha_PS

21 Nov 2025

📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying the science of deep learning and applying deep learning to accelerate the natural sciences. Very broadly interested in questions at the intersection of language, reasoning and sequential decision making. (Plus any other fun problems that catch our eye along the way!) 🚀 I am recruiting 2 PhD students for this cycle! If you're interested in joining, please apply here: cs.nyu.edu/dynamic/phd/admis… cds.nyu.edu/phd-admissions-r…

1,824

244,865

John Hewitt

John Hewitt @johnhewtt

19 Nov 2025

Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park

128

947

79,338

John Hewitt

John Hewitt @johnhewtt

19 Nov 2025

I hire through the computer science department, and will be hiring 1-2ish PhD students this year. Columbia and New York have been an amazing place to live and do research. And if you're not convinced, we just bought a mini fridge for snacks. Join us! cs.columbia.edu/~johnhew/

106

19,032

John Hewitt

John Hewitt @johnhewtt

23 Oct 2025

New work! Gemma3 can explain in English what it learned from data – when we distill that data into a new word (embedding) and query it for a description of the word. Gemma explained a word trained on incorrect answers as: “a lack of complete, coherent, or meaningful answers...”

191

36,786

more replies

John Hewitt

John Hewitt @johnhewtt

23 Oct 2025

In one example, we taught Gemma a neologism that causes single-sentence answers. When asked for synonyms of this new word, it suggested “lack,” as in, “Give me a lack answer.” This didn’t look right, but indeed causes very curt answers. We call this a machine-only synonym.

2,001

John Hewitt

John Hewitt @johnhewtt

23 Oct 2025

We see this as a step towards developing new language tools for learning about how language models store, process, and reason about potentially complex concepts—differently from how we do. Work with Oyvind Tafjord, Robert Geirhos, @_beenkim Blog here: cs.columbia.edu/~johnhew//ne…

1,704