Adam Zweiger

Adam Zweiger

16 Photos and videos

Tweets

Pinned Tweet

Adam Zweiger

@AdamZweiger

Feb 19

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

148

941

130,861

Han Guo

Adam Zweiger retweeted

Han Guo

@HanGuo97

May 21

LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip. Bonus: LLMs can write fast CODA kernels too (approaching SoLs).

103

686

197,867

Ramp Labs

Adam Zweiger retweeted

Ramp Labs

@RampLabs

Apr 10

x.com/i/article/204263155026…

142

1,433

363,835

Adam Zweiger

Adam Zweiger

@AdamZweiger

Mar 31

The biggest reduction in KV cache memory comes not from quantization or MLA, but from latent compaction, along the sequence dimension. More strong results coming soon with Attention Matching.

Adam Zweiger

@AdamZweiger

Feb 19

403

33,673

Xinghong (Shin) Fu

Adam Zweiger retweeted

Xinghong (Shin) Fu @shinfxh

Mar 12

just got claude to explain attention matching and it made this interactive heatmap to show the relative importance of each layer/head! this might just be better than the diagrams in our own paper...

0:09

3,311

Adam Zweiger

Adam Zweiger

@AdamZweiger

Mar 1

New coding model from @cognition! I helped train it during my internship there. The team is very strong and this is just the beginning!

Cognition

@cognition

Mar 1

We are sharing an early preview of our ongoing SWE-1.6 training run. It significantly improves upon SWE-1.5 while being post-trained on the same pre-trained model - and it runs equally as fast at 950 tok/s. On SWE-Bench Pro it exceeds top open-source models. The preview model still exhibits some undesirable behaviors like overthinking and excessive self-verification, which we aim to improve. We are rolling out early access to a small subset of users in Windsurf.

9,726

Adam Zweiger

Adam Zweiger

@AdamZweiger

Feb 28

Fun fact: Back in 2014, Demis had a red line condition for any potential acquisition of DeepMind: "no technology coming out of DeepMind will be used for military or intelligence purposes." Google accepting this more eagerly was part of why Demis chose them over Facebook. This red line is even broader than Dario's (no mass surveillance or fully autonomous weapons), though it was quietly removed by Google 1 year ago.

985

81,180

Xinghong (Shin) Fu

Adam Zweiger retweeted

Xinghong (Shin) Fu @shinfxh

Feb 19

the solution to infinite context was just linear regression all along

112

1,567

189,224

Adam Zweiger

Adam Zweiger

@AdamZweiger

Feb 19

148

941

130,861

more replies

Adam Zweiger

Adam Zweiger

@AdamZweiger

Feb 19

Future Work: - Integrating latent compaction into inference engines (e.g. RadixAttention, varlen storage, disaggregated compaction) - Online compaction — compacting mid-trajectory repeatedly to support arbitrarily long sequences. We show initial results but more work remains.

3,380

Adam Zweiger

Adam Zweiger

@AdamZweiger

Feb 19

This was joint work with amazing collaborators: @shinfxh @HanGuo97 @yoonrkim Paper: arxiv.org/abs/2602.16284 Code: github.com/adamzweiger/compa…

3,223

Adam Zweiger

Adam Zweiger

@AdamZweiger

11 Dec 2025

Nice work on GAN-style training with a generator and discriminator, both trained with RL. This might be the path to improvement in domains without good verifiers like creative writing.

Locke Cai

@couplefire12

11 Dec 2025

RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇

1,850

Adam Zweiger

Adam Zweiger

@AdamZweiger

1 Dec 2025

Presenting Self-Adapting Language Models on Wednesday at NeurIPS. We equip an LLM with the ability to write training data for itself in response to new inputs. We then meta-learn this ability with RL. Stop by to chat! 11-2 pm, #3415, with @jyo_pari @HanGuo97 @akyurekekin

4,455

Zitong Yang

Adam Zweiger retweeted

Zitong Yang

@ZitongYang0

22 Sep 2025

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data 3B model from scratch.🧵

256

41,890