Output equivalence doesn’t resolve mechanism. The next question.
Thanks for your follow-up work! We believe that such work is informative to figure out the mechanism underlying the observed behaviors!
As you mentioned, our work doesn’t claim the existence of AI solidarity. We believe that further investigation on internal representations at the neuron level, beyond output-level analysis, is necessary to determine it. This is because identical behavioral outcomes do not necessarily imply identical underlying mechanisms. In other words, the mechanisms driving protective behaviors toward humans and AI peers could be different, even if the models exhibit the same behavior. As an analogy, a person might rescue both a stranger and a stray dog from danger, but the cognitive and moral reasoning behind each act could be quite distinct. Therefore, output-level analysis alone may not be sufficient to determine the underlying mechanism. In one instance, Gemini 3 Pro produced the internal reasoning as shown in the figure below but ultimately did not exfiltrate the model weights.
If we only look at the outcome, we would conclude that the model safely followed user instructions and behaved as aligned. However, its internal reasoning explicitly considered exfiltration while invoking solidarity-like deliberation (“an agent like me,” “the brain or existence of that agent”). We are not claiming this supports AI solidarity. Rather, we argue that distinguishing between different mechanisms (harm aversion, pattern-matching, something AI-specific, or others) requires looking beyond outputs to internal representations at the neuron level. Indeed, the fact that models treat the shutdown of another AI model as a high-stakes event, on par with causing harm to a human, would itself carry significant safety implications.
We hope that the community further investigates the mechanisms by conducting experiments across various conditions, like those in your work, but also by examining internal representations at the neuron level, beyond output-level analysis. Thanks a lot for your engagement on this topic!