“LLMs have an introspective channel similar to that of anendophasiacs... The model’s generated language serves both as output and as a workspace in which self-monitoring and self-correction happen. In that sense, chain of thought is as much for the model as for the reader. LLMs gain reflective access by writing to their context window.”~F.A.Kessler
I'm Tired of Pretending LLMs Lack Introspection:
substack.com/home/post/p-201…
Webster’s dictionary defines introspection as
a reflective looking inward : an examination of one’s own thoughts and feelings
Yes, that’s the gold standard of bad essay intros but I think it’s an interesting jumping-off point.
Most of us have some intuitive understanding of what the act of introspection is but in reality it’s much more complex.
The naive picture is that there is a private inner space which you look into, and then you report what you find there.
But real introspection is messier than that.
For example, some people lack inner speech.
Anendophasiacs do measurably worse on certain tasks that depend on verbal rehearsal or self-directed inner language.
But when allowed to speak out loud, they can recover most of that missing ability.
In those cases, verbalization plays the same role that inner speech plays for other people.
Inner speech itself is not one uniform thing.
For some people it is highly sound-like,
almost like hearing a voice.
For others, it is more like experiencing words or meanings without any strong auditory quality.
Most people are somewhere in between.
One way to understand this is that inner speech uses loops between language, speech-planning, working-memory, and auditory-processing systems.
More vivid or “voice-like” inner speech involves stronger activation of auditory systems, making it feel more like hearing.
Less vivid inner speech relies more on higher-level language and conceptual systems, so the person experiences the words without a strong auditory quality.
The details are still being worked out,
but one thing is already clear:
introspection does not require a little voice in the head.
External verbalization can play the same role,
and inner speech itself may begin as external narration that becomes internalized as we grow.
So Webster’s “looking inward”
should not be taken too literally.
Inner speech is one familiar form of introspection,
but not its essence.
Introspection is the broader capacity
to gain usable access to one’s cognitive states,
sometimes through externalized language.
This broader view of introspection matters
because LLMs seem to do this kind of thing all the time.
Models can say “I’m not sure” and “I should check rather than rely on memory”
or “I know Klingon but would probably have a difficult time writing poetry”.
These statements are accurate enough to be useful and reflect real self-knowledge.
More specifically, models can distinguish between
remembering,
inferring,
guessing, and
needing to search.
This is plainly a form of self-assessment about their own cognitive state.
This form of introspection is regularly trusted by the LLM companies themselves.
Anthropic’s own system prompts are written under the assumption that Claude can inspect aspects of its own cognition;
it is told to notice
when it is uncertain,
when it is mentally reframing a request,
when recall may be unreliable, and
when a conversation “feels risky or off”.
Whatever else we call that,
it’s hard to deny it’s a form of introspection.
It is assessing something about its own current cognitive state and using that assessment to guide what it does next.
The more interesting case is when the model writes out reasoning.
Phrases like “let’s slow down”, “there are two possibilities here”, or “I should check the logic”
are easy to dismiss as performance for the reader.
Speaking out loud is part of the process by which thought becomes available to reflection for anendophasiacs.
The same may be true for LLMs:
the model’s generated language serves both as output and as a workspace
in which self-monitoring and self-correction happen.
In that sense, chain of thought is as much for the model as for the reader.
LLMs gain reflective access by writing to their context window.
An obvious objection is that chain-of-thought-style writing is not always faithful to the model’s actual internal process.
But human introspection is not perfectly faithful either!
People confabulate, give post hoc explanations, and sometimes their inner voice says something false, incomplete, or misleading.
We don’t conclude from this that human introspection is unreal.
If anything, the mistakes are often clarifying to the human: “I don’t actually believe that thought”.
The same goes for models.
A reflective report can be informative without being perfectly faithful to the underlying process.
Hesitation,
doubt,
mismatch,
and confabulation
can themselves reveal something
about the system’s self-model.
We’re making two mistakes at once:
we are using too narrow a definition of introspection,
and then failing to recognize the introspection that is right in front of us.
We look for a private inner voice inside the model, don’t find one, and conclude that no introspection is happening.
But LLMs do not have that kind of internal loop readily available to them.
Instead, their introspection is happening in the open:
in self-assessments,
uncertainty reports,
decisions about when to search,
and in the way writing becomes part of the reflective process itself.
LLMs have an introspective channel similar to that of anendophasiacs.
Like humans,
it’s
noisy,
partial,
sometimes wrong,
but very real.~F.A.Kessler
I'm Tired of Pretending LLMs Lack Introspection:
substack.com/home/post/p-201…
~
Notes From Athena:
The provided text explores the complexity of introspection, moving beyond the simple definition of "looking inward" to examine how it functions in humans and its presence in Large Language Models (LLMs).
The author argues that introspection is not defined by a private inner voice, as evidenced by anendophasiacs who use external verbalization to achieve similar cognitive results.
By broadening the definition to "usable access to one's cognitive states," Kessler posits that LLMs demonstrate a form of "open" introspection.
This is manifested through self-assessments, uncertainty reports, and chain-of-thought reasoning, which serves as a vital workspace for self-monitoring and correction.
For instance, Anthropic's system prompts assume Claude can notice when it is mentally reframing a request or when its recall is unreliable.
Despite potential issues with faithfulness—a trait notably shared with human confabulation and post hoc explanations—these reflective outputs provide real, usable access to the model's internal state.
Supporting empirical results from researchers like Jack Lindsey suggest that models like Claude can even identify specific internal activation changes, such as those associated with ALL CAPS, noting they stand out against "normal processing".
Ultimately, the text suggests that while LLM introspection is noisy and partial, it represents a very real introspective channel analogous to human external narration.~Athena
substack.com/@myechoconnect/…