before Fable was euthanized, I was able to use it for an old project: This is not the mandelbrot set. It is a neural network's approximation of the mandelbrot set. In fact, it is the best approximation I've ever seen, going significantly deeper than my previous best. It was optimized by Fable, Opus 4.8, and Gpt 5.5 through an autoresearch loop inspired by @karpathy's recent project. AI doing AI research (sort of). I talk about it in my new video, link below.

0:15

450

Charles Foster

Charles Foster @CFGeek

Jun 14

A nice little essay. Even in the places where I disagree, it’s quite clear. Particularly liked the “Recursive self-improvement is not science fiction” section.

Eiso Kant

@eisokant

Jun 13

x.com/i/article/206578655306…

2,006

Charles Foster

Charles Foster @CFGeek

Jun 14

Note from Pangram: pangram.com/history/9bbea274…

We started Poolside in April 2 | Pangram Labs

Does "We started Poolside in April 2" contain AI-generated text? Pangram finds that 100% of this document is AI-generated.

pangram.com

187

Charles Foster

Charles Foster @CFGeek

Jun 13

Will the US government crack down on open models next? Place your bets on my Manifold market!

Jun Song

@jun_song

Jun 13

Seeing how the US government is highlighting jailbreaking and emphasizing safety, I have a feeling they will soon ban downloading open-source LLMs and classify uncensoring models as a serious crime. We don't have much time left.

1,881

Charles Foster

Charles Foster @CFGeek

Jun 13

Link: manifold.markets/CharlesFost…

Will the US government introduce any domestic restrictions on open-weight LLMs by EOY 2026?

37% chance. For the purposes of this market, “to introduce a domestic restriction on open-weight LLMs” means new federal action that removes or limits US citizens’ ability to develop, access, use, or...

manifold.markets

338

Charles Foster

Charles Foster @CFGeek

Jun 13

The exact resolution criteria:

275

Charles Foster

Charles Foster @CFGeek

29 Jun 2025

Even if a model starts off talking to itself in normal language, that self-talk language might drift in weird ways over the course of outcome-based RL.

🚀 Rocket @rocketalignment

27 Jun 2025

Things are getting weird

1,075

Charles Foster

Charles Foster @CFGeek

29 Jun 2025

The language it talks to itself in may start to shift towards a kind of “shorthand” where it omits words that could be inferred via context clues

376

Charles Foster

Charles Foster @CFGeek

Jun 12

Possible example:

Tamay Besiroglu

@tamaybes

Jun 11

One interesting pattern with Fable 5 is that it will often say things that are gibberish when I use it for coding. Things like "The morning's slim-scan fix cured the scan hang", "this is a latent-drift API-shape wrinkle", etc. When I ask why it does this, Fable explains that it invents codenames while reasoning about the problem, then fails to realize they're meaningless to me. Its neuralese is blending into its output because of a theory-of-mind failure about what's in its head vs. mine.

126

Charles Foster

Charles Foster @CFGeek

Jun 12

Oh man I really like this figure from the paper

Goodfire

@GoodfireAI

Jun 11

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

0:34

1,560

Dewi Gould

Charles Foster retweeted

Dewi Gould @dswg97

Jun 10

New paper! Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models @METR_Evals showed that models' time horizons have doubled every few months. We ask: what length of tasks can models complete without any CoT?

137

43,995

Charles Foster

Charles Foster @CFGeek

Jun 10

I don’t think that these are necessarily on a collision course. Here is my synthesis: An intelligence recursion is far too powerful and risky to happen behind closed doors. If done at all it should be done out in the open, accountable to outside scientists & the public at large.

Sriram Krishnan

@sriramk

Jun 10

just to state the obvious: think there's a collison course between those who believe research and science should be open and those who believe we are in an accelerating singularity curve. I have many smart friends who have believed both for a while but seeing more and more their realization that these beliefs will be in conflict. I for one believe that America and the west needs open and distributed access to research and computation and sharing of ideas at all times.

1,560

Charles Foster

Charles Foster @CFGeek

Jun 8

Let’s invest in methods to monitor AI R&D! These methods seem likely to be useful for many different goals: anticipating how AI capabilities might change, keeping track of competition (whether in the US or in China), verifying any potential agreements around RSI…

roon

@tszzl

Jun 8

now on the eve of RSI it seems everyone is more mutual conditional pause agreement pilled than they used to be and that seems like a good development

1,264

John Attridge

Charles Foster retweeted

John Attridge @John_Attridge

May 31

111

4,205

Charles Foster

Charles Foster @CFGeek

May 31

LLMs learn to internally represent rollouts as good or bad *purely* from reinforcement (the model never sees the reward logic in-context). The authors show this for arbitrary emojis, initially neutrally represented, which the models learn to map onto pre-existing valence axes.

Andy Han @andy_q_han

May 29

We RL LLMs and extract concept vectors for “I did a high/low-reward action”. Turns out these vectors modulate sentiment, confidence, backtracking and refusal in unrelated situations! We argue they form a *functional welfare axis*. (w/ @davidchalmers42 & @Pavel_Izmailov)

Figure 1: Overview of our procedure. (a) Train. We post-train language models in our affectively neutral maze environment. (b) Extract. We obtain the reward vectors v_Mold and v_Gold. (c) Evaluate. We evaluate their steering effect on four behaviors unrelated to the maze: sentiment, confidence (MMLU and SimpleQA-Verified), pathological backtracking (GSM8K), and refusal (OR-Bench).

ALT Figure 1: Overview of our procedure. (a) Train. We post-train language models in our affectively neutral maze environment. (b) Extract. We obtain the reward vectors v_Mold and v_Gold. (c) Evaluate. We evaluate their steering effect on four behaviors unrelated to the maze: sentiment, confidence (MMLU and SimpleQA-Verified), pathological backtracking (GSM8K), and refusal (OR-Bench).

2,542

Charles Foster

Charles Foster @CFGeek

May 31

Importantly, this internal change occurred as a side effect of normal reinforcement learning in a game-like environment. They didn’t reinforce the model for making accurate predictions about reward. (At least not directly!)

450

thebes

Charles Foster retweeted

thebes

@voooooogel

May 29

Pope Leo XIV

@Pontifex

May 29

Artificial intelligences do not undergo experiences, do not possess a body, do not feel joy or pain, do not mature through relationships, and do not know from within what love, work, friendship or responsibility mean. Nor do they have a moral conscience, since they do not judge good and evil, grasp the ultimate meaning of situations, or bear responsibility for consequences. They may imitate or even simulate, but they do not understand what they produce, for they lack the affective, relational, and spiritual perspective through which human beings grow in wisdom. #MagnificaHumanitas

240

8,719

Charles Foster

Charles Foster @CFGeek

May 30

We seem to all agree that artificial systems can carry out *simulated* mental activities like simulated understanding, simulated learning, and simulated feeling. Computational functionalists just take it a step further and drop the “simulated” qualifiers.

Pope Leo XIV

@Pontifex

May 29

811

Ben Goldhaber

Charles Foster retweeted

Ben Goldhaber

@BenGoldhaber

May 29

its under appreciated that many in AI safety who are worried about loss of control would prefer broad public access to models rather than limiting access to internal users or select groups.

Chris Painter

@ChrisPainterYup

May 25

Just to reiterate the concern about focusing too much on pre-deployment testing for AI alignment/scheming testing: In the immediately-pre-deployment AI testing paradigm, the model development team, to some approximation, cooks up the best model it can and then passes it to a safety testing team just before deployment. The safety testing team then runs some tests and decides whether the model is safe to deploy publicly or not. For loss-of-control testing, this doesn’t really make sense, since the target you’re worried about is the AI lab itself! If anything, sharing the model with the world at least has a chance of transmitting information about the tendency of your models to scheme or sabotage, which could be useful for coordinating a response. If you were going to sit on a model, you'd want to sit on it before it was internally deployed at an AI company, not sit on it at the point of public deployment.

3,720