Changyu Chen

Changyu Chen

26 Photos and videos

Tweets

Pinned Tweet

Changyu Chen

@Cameron_Chann

21 Mar 2025

(1/3) My favorite figure from the paper. Nearly all open-source RL frameworks introduce an unintentional bias when computing the masked mean 😮. The fix? Just replace mask.sum with a constant.

179

40,945

Shane Gu

Changyu Chen retweeted

Shane Gu

@shaneguML

Jun 5

If you are into OPD, check out our 2019 paper "Divergence Minization Perspective on Imitation Learning". I like to use a single formula/table to explain how methods relate each other, from fundamentals. Or read DAgger (2010) :)

219

16,180

Vishakh Padmakumar

Changyu Chen retweeted

Vishakh Padmakumar

@vishakh_pk

Jun 3

People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)

208

75,620

Yijia Shao

Changyu Chen retweeted

Yijia Shao @EchoShao8899

May 26

🔴 LIVE this Thursday, May 28th | 6–7PM PST @augmind_fm goes live with @cjziems @dorazhao9, and @Diyi_Yang to discuss their recent paper and the classroom experiment behind it. → Does AI make us happier? → What do we need from LLMs? → How do we reinvent the classroom? Live paper discussion Q&A as well! Live stream link: youtube.com/live/2d49pMXiJOA… Use this link to mark your calendar: partiful.com/e/qV6V6oUiTyLQL…?

Diyi Yang

@Diyi_Yang

May 20

The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :) With all students in my cs329x Human-Centered LLM class, we present 60 pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵

6,804

Diyi Yang

Changyu Chen retweeted

Diyi Yang

@Diyi_Yang

May 20

288

54,080

Dimitris Papailiopoulos

Changyu Chen retweeted

Dimitris Papailiopoulos

@DimitrisPapail

May 18

x.com/i/article/205634415123…

129

1,024

895,417

Lujain Ibrahim

Changyu Chen retweeted

Lujain Ibrahim @lujainmibrahim

May 14

New preprint! In 5 studies (3k users / 12k convs, with a 3-wk longitudinal study), we find that sycophantic AI influences how people view those closest to them. It affects how effortful human interaction seems, how satisfying it is, & who people want to turn to for advice 🧵

174

59,099

Changyu Chen

Changyu Chen

@Cameron_Chann

May 13

Life update: I'm super excited to join @Stanford as a postdoc working with @Diyi_Yang ! I’ll continue my research on RL, and recently I’ve become especially interested in how RL can contribute to human-AI collaboration and collaborative agents. A new chapter begins, from the sunny island to the sunny state ☀️🏝️

198

15,886

Changyu Chen

Changyu Chen

@Cameron_Chann

May 13

Most current AI systems are optimized to solve tasks autonomously, often trained by verifierable signals. But being a good collaborator requires much more: (real-time) communication, coordination, planning ahead, and working proactively. I’m really excited to see frontier labs looking into this direction and pushing genuine ideas. E.g. thinky's interaction models are awesome! @thinkymachines x.com/thinkymachines/status/…

Thinking Machines

@thinkymachines

May 11

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/int…

2:15

1,231

Changyu Chen

Changyu Chen

@Cameron_Chann

May 13

And I feel very lucky to join SALT lab that has been thinking deeply about these problems for a long time.

823

Yijia Shao

Changyu Chen retweeted

Yijia Shao @EchoShao8899

May 12

After our major update to Collaborative Gym, the most common questions have been about the human side. Here’s a detailed thread on key findings from human workers’ CollabSkill 👇 🎙️ Motivated by these, we’re doing a YouTube livestream next Tuesday (5/19) — a crash course on meaningfully collaborating with AI agents.

Yijia Shao @EchoShao8899

Apr 30

AI agents are entering all kinds of work, not just software engineering. Which agent collaborates best with humans? How to handle inter-human variability and measure AI literacy? 📣Introducing CollabSkill — bringing human-agent collaboration skill measurement into Co-Gym.

11,362

wh

Changyu Chen retweeted

@nrehiew_

May 10

x.com/i/article/205312222138…

107

766

256,957

John Yang

Changyu Chen retweeted

John Yang

@jyangballin

May 5

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵

104

246

1,575

728,787

Joachim Baumann @ ICLR'26

Changyu Chen retweeted

Joachim Baumann @ ICLR'26

@joabaum

May 1

Can you boost your AI review scores by asking an LLM to rewrite your paper? Yes! We call it paper laundering Our @icmlconf spotlight paper argues current AI reviewers aren't ready to automate peer review, and outlines what a science of peer review automation should look like🧵👇

First page of the ICML 2026 spotlight paper "Stop Automating Peer Review Without Rigorous Evaluation" by Joachim Baumann, Jiaxin Pei, Sanmi Koyejo, and Dirk Hovy (Stanford University and Bocconi University). The abstract argues that today's AI systems should not be used to produce paper reviews, grounded in two empirical findings: a "hivemind effect" where AI reviewers show excessive agreement and reduce perspective diversity, and "paper laundering," where prompting an LLM to rewrite a paper trivially increases AI reviewer scores through stylistic changes rather than scientific improvements. The paper calls for a science of peer review automation rather than wholesale deployment of general-purpose LLMs.

ALT First page of the ICML 2026 spotlight paper "Stop Automating Peer Review Without Rigorous Evaluation" by Joachim Baumann, Jiaxin Pei, Sanmi Koyejo, and Dirk Hovy (Stanford University and Bocconi University). The abstract argues that today's AI systems should not be used to produce paper reviews, grounded in two empirical findings: a "hivemind effect" where AI reviewers show excessive agreement and reduce perspective diversity, and "paper laundering," where prompting an LLM to rewrite a paper trivially increases AI reviewer scores through stylistic changes rather than scientific improvements. The paper calls for a science of peer review automation rather than wholesale deployment of general-purpose LLMs.

458

53,033

Yijia Shao

Changyu Chen retweeted

Yijia Shao @EchoShao8899

Apr 30

26,286

Changyu Chen

Changyu Chen

@Cameron_Chann

Apr 24

A key post-training paradigm shift from @mimo_labs to DeepSeek is the move to multi-teacher on-policy distillation - building the generalist from a diverse pool of 10 domain experts. Again surprised by their RL infra that supports full-vocabulary OPD with unbounded (??) number of teachers.

DeepSeek

@deepseek_ai

Apr 24

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/D… 🤗 Open Weights: huggingface.co/collections/d… 1/n

404

Changyu Chen

Changyu Chen

@Cameron_Chann

Apr 23

super cool work on self-play algos. feels very aligned with @CarinaLHong 's “spiral progression” vision for AI in mathematics from a recent podcast: youtube.com/watch?v=78Vyy_dz…

Carina Hong: Can AI Do Math? Lean Proofs, Ancient Intuition & the...

In 2026, a wave of Neo Labs emerged across the United States. Neo L...

youtube.com

Luke Bailey

@LukeBailey181

Apr 23

Self-play led to superhuman Go performance, why hasn’t it for LLMs? In practice, long run self-play plateaus like RL. We study why this happens, and build a self-play algorithm that scales better. It solves as many problems with a 7B model as the pass@4 of a model 100x bigger.

381

Diyi Yang

Changyu Chen retweeted

Diyi Yang

@Diyi_Yang

Apr 23

Many of us are here #ICLR2026 presenting work around human-AI collaboration, evaluation and risks🤩 Come talk to us during poster sessions: @michaelryan207 @StevenyzZhang @ChengleiSi

113

16,010

Amber Liu

Changyu Chen retweeted

Amber Liu

@JIACHENLIU8

Apr 2

We're living in the BEST era for doing research. 💪 After I graduated from my PhD, the rise of AI-native research gave me a new chance to revisit my research experience. Lately, doing research feels incredibly rewarding to me. I get to experience the pure joy of curiosity-driven science because I no longer have to worry about the lower-level implementations or getting bogged down by infrastructure 🚀 (I'll be sharing some of my own recent research driven by this very soon!) But today, let me introduce the New Orchestra 🎻. We wanted to ship a product that absorbs the friction and brings science back to the curiosity.

1:11

462

53,614

Yijia Shao

Changyu Chen retweeted

Yijia Shao @EchoShao8899

Apr 1

New episode of the AM Podcast (@augmind_fm) is live!📺 In EP3, we are honored to invite Woosuk Kwon (@woosuk_k) to share about LLM inference from a brand new perspective! Woosuk is a co-founder & CTO of @inferact and creator of @vllm_project, who has a lot of experience in this space and also great insights on the next frontier of the AI infra. In this conversation, we cover: - How his early projects shaped his taste for infra work - How vLLM started and what made it take off - How emerging apps are reshaping AI infra - What's next: streaming requests, continual learning with RL, on-device inference, and more This conversation really answered a lot of questions I personally have. Hopefully, it can offer something new to those working on the higher end of user-facing applications as well as the lower end of AI infrastructure!

Augmented Mind Podcast

@augmind_fm

Mar 31

"Actually, we (vllm) get more users from the simple UX than vllm performance" For our third guest, we welcome @woosuk_k, co-founder & CTO of @inferact and creator of @vllm_project. To us, Woosuk is a unique guest, and we are amazed by the user-centric perspective on LLM inference he shared — from what makes the vLLM project successful, to new application scenarios to tailor inference to, and to how to support continual learning from user signals, and more. 0:00 - Prelude: Introducing Woosuk and Inferact 3:00 - Woosuk’s First PhD Project 6:00 - How the vLLM Project Got Started 9:18 - AI Infra Needs More Than Just Efficiency 14:08 - How AI Infra and Human-centered AI Are Connected 15:01 - How to Prioritize Feature Requests for Popular AI Infra 18:18 - Streaming Requests and Realtime API 24:05 - Multi-turn, Agentic, Proactive LLMs 27:03 - How to Design AI Infra in a Principled Way 29:13 - How to Design an AI Inference Engine for Continue Learning with RL 35:05 - Would LoRA Training Affect RL Infra Design? 37:28 - Why Start an AI Inference Infra Startup? 40:46 - What Effortless Inference with Open-source Models Means for Developers 43:46 - A Vision for On-device AI Inference 46:19 - Can Today’s Coding Agents Create vLLM?

49:41

1,935

Changyu Chen

Changyu Chen

@Cameron_Chann

Mar 17

kudos to the team for the awesome work! as an RLer, I don’t see this as an alternative to RL. Instead, I’m excited about the potential it brings to tackling some core RL challenges.

Phillip Isola @phillip_isola

Mar 16

A few clarifications to common q's about our thickets paper: 1. Is this just ensembling? Seed averaging? Bagging? ... 2. Is this just Qwen? 3. Is it K times slower inference? 4. RL is dead? Post-training is dead?

202