Jack Cook

Jack Cook

40 Photos and videos

Tweets

Pinned Tweet

Jack Cook @jackcookjack

2 Dec 2025

Training LLMs with NVFP4 is hard because FP4 has so few values that I can fit them all in this post: ±{0, 0.5, 1, 1.5, 2, 3, 4, 6}. But what if I told you that reducing this range even further could actually unlock better training quantization performance? Introducing Four Over Six, a new method for improving the accuracy of NVFP4 quantization with Adaptive Block Scaling. 🧵

253

70,546

Jack Cook

Jack Cook @jackcookjack

May 11

Now seems like a good time to share that I’ve recently joined @thinkymachines to work on pretraining! Very excited to work on the future of human-AI collaboration with this amazing team.

Thinking Machines

@thinkymachines

May 11

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/int…

2:15

136

11,117

alex zhang

Jack Cook retweeted

alex zhang

@a1zhang

Apr 1

4/6 --> 6/7... jokes aside jack put this out crazy fast, and it's a very clever idea that I hope it gets standardized in new hardware :)

Jack Cook @jackcookjack

Mar 31

NVFP4 allows models to be quantized to 4 bits without too much performance degradation, but can we push 4-bit performance even further? Today, we're releasing a new class of low-precision block-scaled data types that natively adapt to your input data: for 4-bit quantization, IF4 (Int/Float 4) allows each scaled group of 16 values to be saved as FP4 or INT4 depending on which option offers less error. Selections are recorded using the scale factor’s sign bit, which is unused in NVFP4, allowing IF4 to offer better performance with no memory overhead! Our data types provide better downstream accuracy in LLMs, they can be implemented efficiently in next-generation hardware accelerators, and they reveal some interesting insights about low-bit quantization! 🧵

11,452

Andrei Panferov @ICLR

Jack Cook retweeted

Andrei Panferov @ICLR @black_samorez

Apr 1

Very cool paper! Glad to see our rotation-based unbiased gradient estimation scheme from Quartet II improve quality as well.

Jack Cook @jackcookjack

Mar 31

Replying to @jackcookjack

With these two features in place, we find that IF data types (IF3, IF4, and IF6) offer better performance over existing block-scaled data types during quantized training and inference. During training, we show that this can be partly explained by an increased preference for integer quantization after inputs undergo a Hadamard transformation, an important component of backward passes in recently-published 4-bit training recipes. During inference, performance improvements can be primarily explained by the reduced quantization error.

975

Charles 🎉 Frye

Jack Cook retweeted

Charles 🎉 Frye

@charles_irl

Apr 1

doot doot

Jack Cook @jackcookjack

Mar 31

4,430

Peyton Walters

Jack Cook retweeted

Peyton Walters

@peywalt

Mar 31

floats aren't cool. you know what's cool? integers.

Jack Cook @jackcookjack

Mar 31

1,029

Jack Cook

Jack Cook @jackcookjack

Mar 31

438

52,292

more replies

Jack Cook

Jack Cook @jackcookjack

Mar 31

You can simulate IF data types today using higher precision formats, but we also show that IF4 can also be implemented efficiently in next-generation hardware accelerators! We design and evaluate an IF4 multiply-accumulate unit (MAC) and find that latency increases by just 4.7% compared to a baseline NVFP4 MAC unit.

1,574

Jack Cook

Jack Cook @jackcookjack

Mar 31

Check out our paper for more analysis, and our GitHub repo if you want to experiment with low-precision block-scaled quantization schemes yourself! We also have more stuff coming out soon, especially related to 4/6, so stay tuned! Code: github.com/mit-han-lab/fouro… Paper: arxiv.org/abs/2603.28765

GitHub - mit-han-lab/fouroversix: Code for the papers: “Four Over Six: More Accurate NVFP4 Quanti...

Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types” - mit-han-lab/fouroversix

github.com

1,298

Dan Alistarh

Jack Cook retweeted

Dan Alistarh @DAlistarh

Feb 2

Happy to release Quartet II, a new method that pushes the frontier of 4-bit LLM training in NVFP4. Fully-quantized pre-training in NVFP4 can now match FP8/FP16 quality much more closely, while maintaining full hardware acceleration! [1/4]

170

19,757

Jack Cook

Jack Cook @jackcookjack

Jan 20

oh, you want a kernel that'll be right about 93% of the time and have tons of really weird and unpredictable edge cases? yeah I'd recommend Triton

402

Charles 🎉 Frye

Jack Cook retweeted

Charles 🎉 Frye

@charles_irl

Jan 14

There was a flippening in the last few months: you can run your own LLM inference with rates and performance that match or beat LLM inference APIs. We wrote up the techniques to do so in a new guide, along with code samples. modal.com/docs/guide/high-pe…

889

93,887

Jack Cook

Jack Cook @jackcookjack

Jan 13

Here's a non-obvious problem with block-scaled quantized Attention: at the edge of your causal mask, later tokens can leak information to earlier ones through the scale factor computation. I wouldn't expect this leakage to matter very much since it affects scales, not values, but it turns out it does actually cause the loss to decrease a little too quickly! Very cool post by @tensorpro and team.

tensorpro

@tensorpro

Jan 13

We trained models with MXFP4-quantized attention, but it turns out this can break causal modeling. Our latest post explains why this happens and how to fix it. matx.com/research/leaky_quan…

2,580

Guangxuan Xiao

Jack Cook retweeted

Guangxuan Xiao @Guangxuan_Xiao

Jan 7

Life update: Wrapped up my PhD at @MITEECS 🎓 Super excited to start working on pre-training at @thinkymachines.

1,923

73,783

alex zhang

Jack Cook retweeted

alex zhang

@a1zhang

Jan 2

Much like the switch in 2025 from language models to reasoning models, we think 2026 will be all about the switch to Recursive Language Models (RLMs). It turns out that models can be far more powerful if you allow them to treat *their own prompts* as an object in an external environment, which they understand and manipulate by writing code that invokes LLMs! Our full paper on RLMs is now available—with much more expansive experiments compared to our initial blogpost from October 2025! arxiv.org/pdf/2512.24601

251

1,095

7,364

2,030,663

Charles 🎉 Frye

Jack Cook retweeted

Charles 🎉 Frye

@charles_irl

18 Dec 2025

use quant.exposed and maybe you too will write a groundbreaking research paper on low-precision training

152

21,316