Johnny Tian-Zheng Wei

Johnny Tian-Zheng Wei

2 Photos and videos

Tweets

Robin Jia retweeted

Johnny Tian-Zheng Wei @johntzwei

Jun 3

Hi all, mentoring junior students can be a special experience. Contrary to common wisdom, my mentorship style is that of "apprenticeship" and my junior students work side-by-side with me on the most important part of my most important project. My reflections are below... [1/2]

1,139

Wang Bill Zhu

Robin Jia retweeted

Wang Bill Zhu

@BillJohn1235813

Jun 4

Sadly I can't attend ICRA this year, but right now Mason @miaosenc is presenting our work PSALM-V at Poster 250. Come by and check it out! 🤖! It's a project I'm really proud of: instead of hand-writing the symbolic rules a planner needs (PDDL pre-/post-conditions), PSALM-V lets an agent figure them out on its own by interacting with a visual, partially observed world. Mason would love to chat about it; go say hi! 👋

885

Ting-Yun Chang

Robin Jia retweeted

Ting-Yun Chang

@CharlotteTYC

Jun 4

This is the final project of my PhD journey 🎓 I've thought a lot about how to make interp actionable in my previous projects. I believe efficiency follows naturally: when we have a deep understanding of the model, we can figure out where to be frugal w/o hurting model accuracy. The Attention Sink and LLM.int8() papers set great examples, and they deeply inspire our paper. Mirroring the findings on value-state drain, we find that large-range value states are equally important in KV cache eviction. Evicting these outliers causes reasoning models to enter an endless self-reflection loop, while keeping them in the cache maintains accuracy. I'm extremely grateful to my amazing coauthors and supportive advisors.

Deqing Fu

@DeqingFu

Jun 4

Introducing VaSE: Value-Aware Stochastic KV Cache Eviction. Reasoning models think in CoT, bloating the KV cache. Eviction caps memory but suffers capability drop. VaSE is a training-free recipe that cuts that cost: keep large-magnitude value states, evict stochastically.

2,228

Harvey Yiyun Fu

Robin Jia retweeted

Harvey Yiyun Fu

@harveyiyun

Jun 4

Excited to share our new paper on KV Cache eviction! We propose a new recipe that is simple and effective: 1. keep those value states with large magnitude, 2. add some stochasticity to the eviction process. Combined, VaSE consistently outperform previous eviction methods with high throughputs, while maintaining constant memory footprints. Huge thanks to all collaborators @CharlotteTYC, @DeqingFu, @chrome1996, and advisors @_jessethomason_ and @robinomial

Deqing Fu

@DeqingFu

Jun 4

2,538

Deqing Fu

Robin Jia retweeted

Deqing Fu

@DeqingFu

Jun 4

50,388

Wang Bill Zhu

Robin Jia retweeted

Wang Bill Zhu

@BillJohn1235813

Jun 1

🚨 [New preprint] Can AI assistants hurt the very people who depend on them? Raine v. OpenAI alleges ChatGPT contributed to a teen's suicide; OpenAI's 2025 "sycophancy" retrospective on GPT-4o. The pattern: harm comes not from capability failures, but from the social dynamics of how models talk to us, especially when users open up. We introduce EUDAIMONIA, a benchmark grounded in a Social AI Design Code rooted in real-world harm cases. 🌐 Project page: eudaimonia-bench.github.io/ 📄 Paper: arxiv.org/abs/2605.30654

1,833

Robin Jia

Robin Jia @robinomial

May 29

Being Johnny’s PhD advisor has not only been a great privilege, but it has forever changed my research vision. His work combining AI, law, and statistics opened my eyes to how technical research can guide policy and promote AI accountability. Excited for his next work as Dr. Wei!

Johnny Tian-Zheng Wei @johntzwei

May 28

Hi all, I defended my PhD thesis. My thesis in two sentences: Current AI measurement takes LLMs as fixed objects, which constrains us to observational measurement. *Spiking* the training data (inserting certain data at known rates), enables statistically principled measurement.

6,781

Johnny Tian-Zheng Wei

Robin Jia retweeted

Johnny Tian-Zheng Wei @johntzwei

May 28

176

18,041

Amin Banayeeanzade

Robin Jia retweeted

Amin Banayeeanzade

@Amin__Bana

May 28

Does your GPT-5.5 also love Valparaíso in Chile 🇨🇱 !? Ask it to “Name a random city in the world”. You might expect a broad sample from thousands of cities. Instead, models collapse to the same small set of answers again and again. 😵‍💫 But why do LLMs lack diversity? Why are they not reliable random number generators? Why do they still struggle with genuinely creative writing? And why do decoding tricks like temperature, top-k, and top-p often fail to recover meaningful diversity? We have some answers in our new paper! 🧪 Demo: diversitycalibration.github.… 📄 Paper: arxiv.org/abs/2605.11128

Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

LLMs collapse to a narrow set of outputs even when many valid alternatives exist. We identify two distributional bottlenecks: order and shape calibration.

diversitycalibration.github.io

2,032

Johnny Tian-Zheng Wei

Robin Jia retweeted

Johnny Tian-Zheng Wei @johntzwei

May 27

🧵[1/5] Works on test set contamination focus on detection, but we show *correction* of inflated test scores is possible. arxiv.org/abs/2605.24818 Our proposal is to spike the training data and insert some test examples at known rates. The spiked examples are used to calibrate...

Spiking the training data to correct for test set contamination

The literature on test set contamination largely focuses on detection, but the correction of contaminated test scores is underexplored. Our core proposal is to spike the training data by...

arxiv.org

4,672

Blaise Agüera (@blaiseaguera.bsky.social)

Robin Jia retweeted

Blaise Agüera (@blaiseaguera.bsky.social)

@blaiseaguera

May 14

Just as single cells became multicellular life, 8B brains are now joining with AI to form a collective superintelligence. At @USC's Institute on Ethics and Trust in Computing inaugural summit, @robinomial, Jinchi Lv, @paria_rd and I discussed navigating this transition.

2,599

Ai2

Robin Jia retweeted

Ai2

@allen_ai

May 8

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

404

87,766

Ryan Yixiang Wang

Robin Jia retweeted

Ryan Yixiang Wang

@RyanYixiang

May 8

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

Ai2

@allen_ai

May 8

530

115,334

Deqing Fu

Robin Jia retweeted

Deqing Fu

@DeqingFu

May 1

Glad to share that this paper is accepted to #ICML 2026 @icmlconf with an updated title "Transformers Provably Learn Algorithmic Solutions for Graph Connectivity, But Only with the Right Data". 🥳

Deqing Fu

@DeqingFu

23 Oct 2025

Why do Transformers fail at algorithmic reasoning? We find it's not a lack of power, but a capacity mismatch. Our new preprint proves a tight, non-asymptotic bound: an L-layer model can only solve graph connectivity on graphs with a diameter up to exactly 3^L. arxiv.org/abs/2510.19753 🧵(1/N)

3,650

Qingchuan (Tony) Yang

Robin Jia retweeted

Qingchuan (Tony) Yang

@qcyang20xx

Apr 30

EPSVec will see you at #ICML2026!!

Qingchuan (Tony) Yang

@qcyang20xx

Mar 11

𝗣𝗿𝗶𝘃𝗮𝘁𝗲 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝘁𝗲𝘅𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 has had the same problem for a while: privacy, quality, or efficiency - pick two 😵‍💫 We think 𝐄𝐏𝐒𝐕𝐞𝐜 changes that 🚀 Paper: arxiv.org/abs/2602.21218

3,814

Yuqing Yang

Robin Jia retweeted

Yuqing Yang @yyqcode

Apr 23

🧵 1/8 What should an LLM assistant remember across conversations? Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem. Introducing BEHEMOTH 🦣 CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction. 📄 Paper: arxiv.org/abs/2604.11610

13,779

Deqing Fu

Robin Jia retweeted

Deqing Fu

@DeqingFu

Apr 23

After three papers on Fourier features in LLMs, I think there's a principle worth naming. How should we do science on an LLM? It corresponds to the existential questions: > who am I? ↔ the phenomenon. > where do I come from? ↔ the emergence. > where am I going? ↔ the use. 🧵

101

166

3,637

5,223,762

Deqing Fu

Robin Jia retweeted

Deqing Fu

@DeqingFu

Apr 23

New paper: Convergent Evolution: How Different Language Models Learn Similar Number Representations. Language models, classical word embeddings, and even raw token frequencies all develop the same Fourier features for numbers. But only some develop the underlying structure. 🧵

108

45,369

Wang Bill Zhu

Robin Jia retweeted

Wang Bill Zhu

@BillJohn1235813

Apr 21

Frontier LLMs don't debug, they regenerate. We built PDB to measure that gap, GPT-5.1-Codex pass unit tests >76% of the time, but touch only <45% of the right lines. Even Claude Code touches only ~50%. 📄 Paper: arxiv.org/abs/2604.17338 🌐 Project: precise-debugging-benchmark.…

Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?

Unlike code completion, debugging requires localizing faults and applying targeted edits. We observe that frontier LLMs often regenerate correct but over-edited solutions during debugging. To...

arxiv.org

1,800

Robin Jia

Robin Jia @robinomial

Apr 20

Excited to announce that Hubble, our new language model suite for studying LLM memorization, was recently featured in @ScienceMagazine ! Hubble has also received an oral presentation slot at ICLR; if you're there, check out @johntzwei and @Aflah02101 's presentation on Saturday!

6,366

Robin Jia

Robin Jia @robinomial

Apr 20

Article: science.org/content/article/… Website: allegro-lab.github.io/hubble… Big thanks to Peter Hall for the article, as well as @NSF NAIRR and @nvidia for the compute!

AIs can ‘memorize’ data they shouldn’t. Can they be forced to forget?

New tool could help researchers probe how models “unlearn” sensitive training material

science.org

1,086