Luke Heeney

Luke Heeney

24 Photos and videos

Tweets

Daniel Jiang retweeted

Luke Heeney

@heeney_luke

Jun 4

We've lost an absolute giant today. RIP Dimitri Bertsekas. His probability and optimization books got me through my masters. Massive loss for the MIT community and the field.

116

1,021

92,874

Ankur Samanta

Daniel Jiang retweeted

Ankur Samanta @Ankur_Samanta_

Jun 1

Self-correction for LLMs is usually done by critiquing an answer and regenerating the entire reasoning trace. We show targeted backtrack-and-resample works better: self-localize the first erroneous step, then sample a counterfactual. Repeat. [1\n]

1,839

Csaba Szepesvari

Daniel Jiang retweeted

Csaba Szepesvari @CsabaSzepesvari

Jun 4

Replying to @yoavgo

Oh, to me it is the opposite. LLM RL is when you say supervised fine-tuning (SFT) instead of behavior cloning, RLVF instead of batch policy optimization, base policy instead of behavior policy, trace instead of trajectory, verifiable reward instead of reward, .. LOL

2,825

Zhuoran Yang

Daniel Jiang retweeted

Zhuoran Yang

@zhuoran_yang

Apr 8

When I teach, I prepare my lecture notes by writing on Goodnotes, then export them as PDFs and share to the class. Recently I asked claude code to read these PDF files and convert them into markdown. This is a test I constantly gave to new LLMs and now claude is good enough to pass it. I converted my RL theory graduate-level course into markdowns and post the lecture notes here: zhuoranyang.github.io/sds685… When preparing the course, I learned a lot from the wonderful courses offered by @CsabaSzepesvari @chijinML @WenSun1 @nanjiang_cs and borrowed some good stuffs from their notes. Also I discussed what to teach in a RL course heavily with @zhaoran_wang and stole many of his insights.

488

77,131

Jiayin Zhi

Daniel Jiang retweeted

Jiayin Zhi

@JiayinZhi0728

Mar 31

✨New paper alert Does AI help or harm critical thinking? It’s one of the most pressing questions around AI. But maybe the better question is: under what conditions? In our #CHI2026 paper, we find that time can change everything. 🌐 criticalthinking-ai.app/ #AI #criticalthinking #humancognition #decisionmaking #civicparticipation (1/10)

2,980

Joon Sung Park

Daniel Jiang retweeted

Joon Sung Park @joon_s_pk

Feb 12

Introducing Simile. Simulating human behavior is one of the most consequential and technically difficult problems of our time. We raised $100M from Index, Hanabi, A* BCV, @karpathy @drfeifei @adamdangelo @rauchg @scottbelsky among others.

1:11

502

827

7,775

2,352,092

Daniel Jiang

Daniel Jiang @danielrjiang

Feb 4

Great work by @Ankur_Samanta_ on self-improvement in LLMs via RL post-training on “debate” outcomes!

Ankur Samanta @Ankur_Samanta_

Feb 2

Can LLMs self-improve reasoning without external supervision? Introducing MACA: RL for teaching agents how to better leverage multi-agent debate. No external labels—just copies of the model learning from each other. Improved collaboration, accuracy, and self-consistency. [1\n]

327

Mina Lee

Daniel Jiang retweeted

Mina Lee @MinaLee__

29 Dec 2025

What if we boost LLM output diversity by fusing base and aligned models? Excited to share this work led by @YichenZW, whose passion for understanding LLM diversity really shows here. More insights and fun ideas on LLM creativity coming soon as well!💡

Yichen (Zach) Wang @YichenZW

22 Dec 2025

Lack of diversity in your LLM generation? (also noted by Artificial Hivemind, best paper @NeurIPSConf) Time to bring your base model back! An inference-time, token-level collaboration between a base and an aligned model can optimize and control diversity and quality!

5,433

Aditya Oberai

Daniel Jiang retweeted

Aditya Oberai @aditya_oberai

29 Oct 2025

TD Learning can suffer on long tasks: ↑ deep bellman recursions → ↓ poor scalability (despite big data) We introduce a new method (TRL) with a "divide-and-conquer" value update, which scales well with long horizons!

235

62,836

Ethan Mollick

Daniel Jiang retweeted

Ethan Mollick

@emollick

10 Oct 2025

This paper shows that you can predict actual purchase intent (90% accuracy) by asking an LLM to impersonate a customer with a demographic profile, giving it a product & having it give its impressions, which another AI rates. No fine-tuning or training & beats classic ML methods.

141

684

7,638

902,545

Seohong Park

Daniel Jiang retweeted

Seohong Park @seohong_park

9 Oct 2025

Introducing *dual representations*! tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL. Blog post: seohong.me/blog/dual-represe… Paper: arxiv.org/abs/2510.06714 ↓

121

941

180,751

Ayush Jain

Daniel Jiang retweeted

Ayush Jain @Ayushj240

7 Aug 2025

Honored that our @RL_Conference paper won the Outstanding Paper Award on Empirical Reinforcement Learning Research! 📜Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-Functions 📎openreview.net/forum?id=H3jc… Grateful to my advisors @JosephLim_AI and @ebiyik_!

Ayush Jain @Ayushj240

6 Aug 2025

At @RL_Conference🍁, I'm presenting a talk and a poster on Aug 6, Track 1: Reinforcement Learning Algorithms. We find that Deterministic Policy Gradient methods like TD3 often get stuck at local optima under complex Q-functions, and propose a novel actor architecture! 🧵

8,085

John Horton

Daniel Jiang retweeted

John Horton

@johnjhorton

6 Aug 2025

Use the same @ExpectedParrot survey you ran w/ LLMs with actual humans via our @Prolific integration (or bring your own panel) - it works!

1,151

Daniel Jiang

Daniel Jiang @danielrjiang

31 Jul 2025

Sharing our new preprint on using reinforcement learning (RL) to post-train LLMs for generative ads on Facebook! Joint work with Alex Nikulkov, Yu-Chia Chen, Yang Bai, and @ZheqingZhu. Paper link: arxiv.org/abs/2507.21983

3,441

more replies

Daniel Jiang

Daniel Jiang @danielrjiang

31 Jul 2025

In a 10-week A/B test with 35,000 advertisers and 640,000 ad variations, we found that our advertisers using the RL-trained LLM created more performant ads compared to those who used an existing supervised model (trained to imitate curated examples).

152

Daniel Jiang

Daniel Jiang @danielrjiang

31 Jul 2025

We think this is exciting because it suggests that metric-driven post-training can work at scale, and could be applied more broadly beyond online ads, especially in domains where generative AI is already proving useful. Please let us know of comments or feedback!

129

Percy Liang

Daniel Jiang retweeted

Percy Liang

@percyliang

18 Jun 2025

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

570

4,922

679,080

Mina Lee

Daniel Jiang retweeted

Mina Lee @MinaLee__

24 May 2025

What does it mean to write and think with AI? What new possibilities and challenges does that bring? I spoke with THE AI (in Korean) about our group's research and the future of writing with AI. 👩🤖✍️ newstheai.com/news/articleVi…

11,823

AI at Meta

Daniel Jiang retweeted

AI at Meta

@AIatMeta

13 May 2025

CATransformers is a carbon-driven neural architecture and system hardware co-design framework. Using CATransformers, we discover greener CLIP models that achieve an average of 9.1% reduction potential in total lifecycle carbon emissions while maintaining accuracy (or increasing accuracy) and latency. This research is the first to look into carbon-driven neural architecture and system hardware co-design. It is enabled by a first-of-its-kind architectural carbon modeling tool – ACT, which we developed at FAIR. Check out: Our paper ➡️ go.fb.me/nedmh0; code repository ➡️ go.fb.me/7jcp6q; and the additional carbon design tools and research artifacts in Sustainable AI ➡️go.fb.me/csee32

2:07

324

32,607