Yearnalist covering the frontiers of AI @TheInformation. Signal: (530) 400-4184

Joined April 2024
284 Photos and videos
Pinned Tweet
I am losing the fight. If anyone has a space near a BART stop they could volunteer for a couple hours once a week, hmu
Fighting the fundamental need to start SF debate club
1
15
8,650
Ah two hours into vacation 😌 I sure hope nothing AI related happened in the past two hours! Good thing that nothing ever happens 😌 Here’s to another quiet week in the world of AI
2
1
57
2,020
πŸš€ Rocket retweeted
Replying to @StephenLCasper
spinning point cloud GIF is a powerful tool for mech interp. research
2
1
8
758
Oh word
2
23
1,810
πŸš€ Rocket retweeted
NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash. β€œWe’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. β€œWe made the wrong tradeoff and we apologize for not getting the balance right.”
167
251
2,547
727,124
Researchers offer models steering vectors as drugs, find Qwen likes downers 🚨
Replying to @realmeatyhuman
6/14 To see what they like once they actually "experience" it, we look at what they redose. Surprisingly, Qwen3-8B returns to negatively-valenced vectors like melancholic ~3Γ— more under real steering than placebo (p < 1e-4). No satisfying explanation yet!
5
440
One step closer to submarine data centers
1
3
408
They should put it in a submarine though
1
155
But what if they put it in a submarine πŸ€”
50
πŸš€ Rocket retweeted
ok claude now remove any epicycles
2
10
95
6,532
The art did not have to go this hard
Anthropic says Claude now writes 80% of its code, raising new questions about whether AI labs can safely automate the work of building more powerful AI. The company is also backing the idea that labs may need a way to coordinate on slowing or pausing development. Read more in today's AI Agenda: thein.fo/4g7qbCs
2
2
45
1,905
πŸš€ Rocket retweeted
Would an LLM tell you if it’s gaming your eval? Often, no. But we can still catch the model thinking about it. New research: we measure how close a model comes to saying it’s being tested. This detects eval awareness with 10Γ— to 100Γ— fewer samples than monitoring model outputs.🧡
4
18
86
15,048
I took an econ class from Phil at Oxford one summer and I can tell you right now this is one to watch
Economics of AGI episode w Alex Imas and Phil Trammell. There's a bunch of important questions about how we deal with AI that only economics can answer. What is the optimal way to tax and redistribute the wealth that will be generated? How should countries not in the AI supply chain index into the gains? Is there any world where inequality doesn't explode? It might seem like these questions have obvious answers, but the first thing economics teaches you is that your intuitions can often be entirely wrong. It was very helpful to chat through these things with Alex and Phil. Look up Dwarkesh Podcast on Apple Podcasts, YouTube, or Spotify. Enjoy! 00:00:00 – Will capital share increase? 00:19:36 – Messy Middle scenario 00:25:57 – How to tax and redistribute AI wealth 00:30:02 – Why demand collapse is unlikely 00:39:26 – Human employees would be hard to integrate into the machine economy 00:43:08 – What if some humans (or AIs) value wealth accumulation intrinsically? 01:01:28 – What should developing countries do?
1
7
815
πŸš€ Rocket retweeted
🚨some personal news: i am moving to the job many of you assumed i already had🚨 i am now covering openai for @theinformation! for the next few months i'll be writing a lot about the ipo, but i'm interested longer term in safety, policy and ai culture, inside and outside of sf.
11
9
291
30,093
πŸš€ Rocket retweeted
Replying to @theinformation
@theinformation profiled @Turn_Trout's and my work on eval cooperativeness!
3
6
48
830
New research on eval awareness
Researchers are racing to solve a new AI challenge known as eval awareness. As models become more sophisticated, they are getting better at recognizing evaluations and may behave differently during them. Read more: thein.fo/4dICTGj
3
212
Internal benchmarks -> better router -> better subagents
Cognition is overhauling Windsurf into Devin Desktop, a hub where developers can manage AI coding agents from OpenAI, Anthropic and others. The strategy positions Cognition as a neutral platform in a market increasingly dominated by model providers. Full story: thein.fo/3SgCthZ
159
Eval awareness is also relevant for capability evals but seems more problematic for propensity evals
Frontier AI model safety benchmarks are breaking down due to self-aware models, @rocketalignment reports. "We're finding out that the models as they're getting smarter are getting better at detecting when they're being evaluated, when they're in a test."
1
7
332
πŸš€ Rocket retweeted
You should really follow @rocketalignment. You don’t wanna miss gems like this dive into exciting research by @MATSprogram.
1
4
15
951
πŸš€ Rocket retweeted
Our work on Decomposing and Measuring Evaluation Awareness was covered by @theinformation. Thanks @rocketalignment for the write-up! We position this work as the foundational reference for studying evaluation awareness, providing a unified definition and decomposition, empirical baselines across nine frontier models and four benchmarks, and a controlled benchmark for exploring solutions. Newsletter and paper in thread 🧡
1
7
23
1,104
It’s Fort Knox in here
1
8
275
And you thought Elon's lawsuit was dramatic
12
438
Was talking to someone about goblins and RLHF artifacts. Got to thinking about what would reward hack our own poetry RMs I'm a sucker for lines like - on the porch swing of my mind - down the hallways of my mind - I'd like to walk around in your mind TIL these are eyeball kicks
1
1
1,247
And β€œfrom the back porch of my mind” from Bright Eyes. Of course they walked so Zach Bryan could run
99