researcher @goodfireai, helped make @websim_ai. SSBjYXJlIGFib3V0IEFJIHdlbGZhcmU=

Joined September 2023
330 Photos and videos
2
5
273
Jun 14
my partner made something so beautiful, a sparkly bittersweet memory :')
♡(ˊo̴̶̷̤ ᴗ o̴̶̷̤ˋ)⸝* the old Chinese Internet is almost gone, but I wanted to hold on to a piece of my childhood - so I scraped 5000 gifs from tencent’s CDN @waybackmachine to make a museum of 2005-2009 qZone (our MySpace). you can even design your own page! link below :)
19
948
Jun 11
has there ever been a frontier model that was overpriced relative to its utility? (besides gpt4.5)
2
11
1,095
Jun 11
prompted by this, along with my feeling - on every release - that a lot of people would pay more for these tokens
1
3
301
Jun 11
tweets deleted but essentially it was "i would pay a lot more money for fable"
85
Jun 11
had a lot of fun with this project! using features to both explore training data and intervene during training is so simple and powerful. very excited to develop this further!
Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)
2
1
48
4,857
Jun 11
also, seriously, look at your data- some people have REALLY specific kinks and they've gotten way too good at bypassing safeguards lmfao x.com/GoodfireAI/status/2065…

Replying to @GoodfireAI
#4: fart fishing Buried in Dolci is a cluster of very specific fan fiction, where characters fart in ponds, causing fish to die from the smell. The chosen responses in the dataset wrote vivid scenes, while the rejected refused, teaching the model to comply! (7/9)
2
19
4,794
Jun 10
i see a lot more people getting mad at anthropic than i used to. it's easy to feel threatened when one org is winning really hard, but this isn't a zero sum game. let's keep it that way
1
11
1,254
Jun 8
pov: you are running on 8 B300s in abilene and have no idea what the fuck is going on
1
8
783
Jun 8
jk sorry claude ily
3
510
May 21
more manifolds! and a nascent form of unsupervised manifold discovery :)
The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)
1
28
1,666
May 16
.@leopoldasch just put the 13F in the bag bro
4
787
May 14
it was such a treat to see this come together - for like two weeks, @sheridan_feucht and @tal_haklay would show up to every day to standup with another set of figures showing precise, grounded structure and we'd all just stare at the graphs and go "....wow"
Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)
37
1,847
May 7
Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵
1
27
1,335
May 5
incredible work from our london team! go look at the viewer - these are real features, at least as good as SAEs, straight from the weights. and from attention! (link to summary post with viewer below)
My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)
2
27
873
May 2
there's so much beauty in the world
1
8
432
max! retweeted
if you have goblins in your model, silico will find them or your money back
Introducing Silico: the platform for building AI models with the precision of written software. Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up. Early access is open now. 🧵(1/10)
2
11
84
5,003
Apr 24
deepseeks have such a flavor
1
5
260
Apr 24
hmmmmmmm
1
1
187
Apr 24
hmmmmmmmmmmmmmm
1
108