AM9:21

AM9:21

Users
Tweets

AM9:21

@AM921543266

15m

技術力おかしすぎて訳わからんAI企業四天王 Goodfire / LiquidAI / Worldlabs / QuiverAI

AM9:21

AM9:21

@AM921543266

36m

マジでGoodfireのブログだけは内容がむずすぎてヤバいんだよな... 自分の地力不足を感じる

Goodfire

@GoodfireAI

Jun 11

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

0:34

Ekdeep Singh Lubana

Ekdeep Singh Lubana @EkdeepL

21h

Replying to @EkdeepL @StephenLCasper @ushabhalla_ @GoodfireAI

...having her "acknowledge" your opinion of Goodfire, i.e., "rich and shameless big tech stuff", and most critically the final note about Fig 4, which was something she invested so much time in just to build a good research artifact---that platform is *not* our product...

108

Cas (Stephen Casper)

Cas (Stephen Casper)

@StephenLCasper

21h

Replying to @EkdeepL @ushabhalla_ @GoodfireAI

I thought of something. You might want to more clearly separate your replies to my criticisms of GF with your replies about whether and how you think I criticized an employee there. This was not your goal, but this reply could be uncharitably be read a certain very bad way. If one were to frame criticisms of GF overall as attacks on someone who works there, it would be using the person as a shield for the company. It might be good to explicitly separate what you say about my comments on GoodFire in general and what you say about my comments on specific stuff from people there.

108

Cas (Stephen Casper)

Cas (Stephen Casper)

@StephenLCasper

23h

Replying to @ushabhalla_ @GoodfireAI

Thx and to be real, I see this type of work very differently because of where it’s coming from. If approximately the same paper came from almost anywhere else, I would have comments about 1) expanding the related works section, 2) making the actual mechanics of the method clear in the abstract, and 3) talking about how SAEs are not a unique way of learning about your datasets an how it affects your model and that a limitation of the paper is that it did not pit the method against an unsexy equal effort control involving manual or automated dataset exploration to find issues. But that would be it. But the paper is from GoodFire, and on one hand, I hold it to a different standard because of the people and money it has. On the other hand, I’m against what GoodFire has become as a VC backed for profit company with a product to sell using GIFs and McGrathian marketing to convince non-scientists that GF is doing things that are much more impressive than they are. This often includes safetywashing. Back in 2023 I read Eric Ho’s white paper for GF, got the ick, and told him that I thought that the absolute last thing that the epistemic of the interpretability community needed was a big company that sucks up a bunch of researchers to spin, market, and sell their work for profit. And not to sound a certain way, but Eric then proceeded to do exactly what I had worried about. I like your work. I just don’t like where you work. It’s worth taking a second to acknowledge how much it sucks that GoodFire can raise over 1 billion while academic labs like Lakkaraju’s, Bau’s, Geva’s, Tegmark’s etc. do less conflicted and typically better work, with far less recognition and far less money, far more efficiently. The difference isn’t research quality. It’s the rich and shameless big tech and venture capital stuff. Unfortunately, I think it’s clear that GoodFire’s leadership is adapting to trade epistemic responsibility for exploiting that nonsense and miseducating its audience. (And don’t get me wrong, I have related thoughts on GDM, OAI, and Anth.) IDK, how do you feel about being one of the authors on a paper in which figure 4 seemed to be an advertisement for a venture capital backed tech product?

174

∞-modal

∞-modal @NoahChrein

Jun 13

Replying to @FrJulienLaurent

I’ve seen some great things out of goodfire recently on this lucid pruning

Viola Zhong

Viola Zhong @viola_zhongg

Jun 13

Replying to @ArthurConmy

Don’t tell us you’re gonna move from London to Bay Area and join goodfire

171

AI時代の羅針盤 (compass for the AI era)

鳥井シンゴ retweeted

AI時代の羅針盤 (compass for the AI era)

@compassinai

Jun 13

【AIはデータから"何"を学んでいるのか？】 AIの振る舞いを決める重要な学習段階である「ポストトレーニング」。しかし現在の方法では、与えたデータがAIにどう影響しているのかが不透明なままでした。 AI企業のGoodfireが提案するのは、この学習プロセスを透明化する新技術です。解釈可能性ツールを活用し、AIがデータから学習する「概念」を最適化の前に可視化・監査するデータ中心のパイプラインを開発しました。モデルに意図しない癖がつくのを防ぎ、学習シグナルそのものを"彫刻"するように形作る新アプローチ。果たして、この技術はAIの振る舞いをどこまで制御できるのか？ 👇詳細はリプ欄の動画で！

1,625

ꜰᴇʀʀᴇᴛ

ꜰᴇʀʀᴇᴛ @stferret

Jun 13

Replying to @teortaxesTex

goodfire ceo getting a text from sama in the middle of the night... "you up?"

6,710

まぁる＠YouTube×AI自動化の設計士

まぁる＠YouTube×AI自動化の設計士

@Mar_3simai

Jun 13

ポストトレーニングの透明性って本当に重要ですよね。実際のところ、どのデータがモデルの振る舞いに寄与しているかって、プロンプトエンジニアリングやRAG設計をするときに暗中模索になりやすいんです。Goodfireみたいに学習プロセスを可視化できると、ファインチューニングの効果検証や意図しない挙動の原因特定が随分やりやすくなる気がします。

AI時代の羅針盤 (compass for the AI era)

@compassinai

Jun 13

110

Cas (Stephen Casper)

Yanchen Liu retweeted

Cas (Stephen Casper)

@StephenLCasper

Jun 12

Replying to @GoodfireAI

Goodfire appears to have reinvented predictive dataset coding. But instead of looking at the data and arbitrary embeddings, you look at *SAE* embeddings. I think there’s a pretty decent history of prior papers and similar or simpler methods for doing the same kind of work with datasets. For example, you can check out our work and related works section from this 2023 paper: arxiv.org/abs/2306.09442 I’m left kind of disappointed by GF not pitting their method fairly against regular, unsexy dataset coding work that doesn’t involve SAEs. No doubt that would have been able to uncover dirty laundry in these preference datasets too, maybe with less effort. I think this is once again an example of GF interp research that does something simple in a complex way, without what I would see as a fair baseline that commensurate effort has been put into. It then lays on cherry-picked illustrative examples enough to get clicks and convince laypeople and VCs that progress is being made.

Explore, Establish, Exploit: Red Teaming Language Models from Scratch

Deploying large language models (LMs) can pose hazards from harmful outputs such as toxic or false text. Prior work has introduced automated tools that elicit harmful outputs to identify these...

arxiv.org

946

AI:AM

Johann Schmidt retweeted

AI:AM

@AI_in_the_AM

Jun 12

"This would be like a complete debugger" Tom McGrath, chief scientist at Goodfire, argues model training should work like software debugging — tracing failures all the way back to data. "I might find some some rare but very much unwanted error in production." "So you wanna be able to go back from these examples to, like, the mechanisms that cause them" "This would let you go from an error to the data, fix the data, and then you can fix the model." @banburismus_

1:04

474

AI:AM

Johann Schmidt retweeted

AI:AM

@AI_in_the_AM

Jun 12

"I do feel optimistic about continual learning approaches" Tom McGrath, chief scientist at Goodfire, predicts the most likely AI breakout path is continual learning — because incumbents struggle to adapt to it. "You have different models per person. The inference footprint is kind of wacky" "that's simply a very hard like, that's outside of the the operating model of of most frontier labs." @banburismus_

0:54

434