🚀 Rocket

🚀 Rocket

284 Photos and videos

Tweets

Pinned Tweet

🚀 Rocket @rocketalignment

17 Sep 2025

I am losing the fight. If anyone has a space near a BART stop they could volunteer for a couple hours once a week, hmu

🚀 Rocket @rocketalignment

16 Sep 2025

Fighting the fundamental need to start SF debate club

8,650

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 13

Ah two hours into vacation 😌 I sure hope nothing AI related happened in the past two hours! Good thing that nothing ever happens 😌 Here’s to another quiet week in the world of AI

2,020

Giang Nguyen

🚀 Rocket retweeted

Giang Nguyen

@giangnguyen2412

Jun 12

Replying to @StephenLCasper

spinning point cloud GIF is a powerful tool for mech interp. research

758

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 12

Oh word

1,810

Max Zeff

🚀 Rocket retweeted

Max Zeff

@ZeffMax

Jun 11

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash. “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

167

251

2,547

727,124

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 10

Researchers offer models steering vectors as drugs, find Qwen likes downers 🚨

𝕾𝖎𝖉 @realmeatyhuman

Jun 10

Replying to @realmeatyhuman

6/14 To see what they like once they actually "experience" it, we look at what they redose. Surprisingly, Qwen3-8B returns to negatively-valenced vectors like melancholic ~3× more under real steering than placebo (p < 1e-4). No satisfying explanation yet!

440

🚀 Rocket

🚀 Rocket @rocketalignment

27 Jul 2025

One step closer to submarine data centers

408

🚀 Rocket

🚀 Rocket @rocketalignment

29 Oct 2025

They should put it in a submarine though

155

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 10

But what if they put it in a submarine 🤔

alth0u🧶

🚀 Rocket retweeted

alth0u🧶

@alth0u

Jun 5

ok claude now remove any epicycles

6,532

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 8

The art did not have to go this hard

The Information

@theinformation

Jun 8

Anthropic says Claude now writes 80% of its code, raising new questions about whether AI labs can safely automate the work of building more powerful AI. The company is also backing the idea that labs may need a way to coordinate on slowing or pausing development. Read more in today's AI Agenda: thein.fo/4g7qbCs

1,905

Santiago Aranguri

🚀 Rocket retweeted

Santiago Aranguri

@santiaranguri

Jun 4

Would an LLM tell you if it’s gaming your eval? Often, no. But we can still catch the model thinking about it. New research: we measure how close a model comes to saying it’s being tested. This detects eval awareness with 10× to 100× fewer samples than monitoring model outputs.🧵

15,048

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 4

I took an econ class from Phil at Oxford one summer and I can tell you right now this is one to watch

Dwarkesh Patel

@dwarkesh_sp

Jun 4

Economics of AGI episode w Alex Imas and Phil Trammell. There's a bunch of important questions about how we deal with AI that only economics can answer. What is the optimal way to tax and redistribute the wealth that will be generated? How should countries not in the AI supply chain index into the gains? Is there any world where inequality doesn't explode? It might seem like these questions have obvious answers, but the first thing economics teaches you is that your intuitions can often be entirely wrong. It was very helpful to chat through these things with Alex and Phil. Look up Dwarkesh Podcast on Apple Podcasts, YouTube, or Spotify. Enjoy! 00:00:00 – Will capital share increase? 00:19:36 – Messy Middle scenario 00:25:57 – How to tax and redistribute AI wealth 00:30:02 – Why demand collapse is unlikely 00:39:26 – Human employees would be hard to integrate into the machine economy 00:43:08 – What if some humans (or AIs) value wealth accumulation intrinsically? 01:01:28 – What should developing countries do?

1:16:08

815

Erin Woo

🚀 Rocket retweeted

Erin Woo @erinkwoo

Jun 3

🚨some personal news: i am moving to the job many of you assumed i already had🚨 i am now covering openai for @theinformation! for the next few months i'll be writing a lot about the ipo, but i'm interested longer term in safety, policy and ai culture, inside and outside of sf.

291

30,093

jasmine is in london!

🚀 Rocket retweeted

jasmine is in london!@jasminexli

Jun 3

Replying to @theinformation

@theinformation profiled @Turn_Trout's and my work on eval cooperativeness!

830

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 3

New research on eval awareness

The Information

@theinformation

Jun 2

Researchers are racing to solve a new AI challenge known as eval awareness. As models become more sophisticated, they are getting better at recognizing evaluations and may behave differently during them. Read more: thein.fo/4dICTGj

212

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 3

Internal benchmarks -> better router -> better subagents

The Information

@theinformation

Jun 2

Cognition is overhauling Windsurf into Devin Desktop, a hub where developers can manage AI coding agents from OpenAI, Anthropic and others. The strategy positions Cognition as a neutral platform in a market increasingly dominated by model providers. Full story: thein.fo/3SgCthZ

159

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 3

Eval awareness is also relevant for capability evals but seems more problematic for propensity evals

The Information

@theinformation

Jun 2

Frontier AI model safety benchmarks are breaking down due to self-aware models, @rocketalignment reports. "We're finding out that the models as they're getting smarter are getting better at detecting when they're being evaluated, when they're in a test."

0:49

332

Kevin Frazier

🚀 Rocket retweeted

Kevin Frazier

@KevinTFrazier

Jun 2

You should really follow @rocketalignment. You don’t wanna miss gems like this dive into exciting research by @MATSprogram.

951

Changling Li

🚀 Rocket retweeted

Changling Li

@ChanglingXavier

Jun 2

Our work on Decomposing and Measuring Evaluation Awareness was covered by @theinformation. Thanks @rocketalignment for the write-up! We position this work as the foundational reference for studying evaluation awareness, providing a unified definition and decomposition, empirical baselines across nine frontier models and four benchmarks, and a controlled benchmark for exploring solutions. Newsletter and paper in thread 🧵

1,104

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 2

It’s Fort Knox in here

275

🚀 Rocket

🚀 Rocket @rocketalignment

Jun 1

And you thought Elon's lawsuit was dramatic

438

🚀 Rocket

🚀 Rocket @rocketalignment

May 16

Was talking to someone about goblins and RLHF artifacts. Got to thinking about what would reward hack our own poetry RMs I'm a sucker for lines like - on the porch swing of my mind - down the hallways of my mind - I'd like to walk around in your mind TIL these are eyeball kicks

1,247

🚀 Rocket

🚀 Rocket @rocketalignment

May 16

Lyrics from Zach Bryan, Alela Diane, Vashti Bunyan Post from Nostalgebraist tumblr.com/nostalgebraist/77…

Post by @nostalgebraist

💬 13 🔁 299 ❤️ 652 · hydrogen jukeboxes: on the crammed poetics of "creative writing" LLMs · This is a follow-up to my earlier brief rant about the new, unreleased OpenAI model that's supposed "go…

tumblr.com

214

🚀 Rocket

🚀 Rocket @rocketalignment

May 29

And “from the back porch of my mind” from Bright Eyes. Of course they walked so Zach Bryan could run