A lot of mechanistic interpretability techniques rely on working with the residual stream in some way.
I wrote a short post unpacking one important property: additivity.
The key idea is that once an attention head or MLP neuron computes its output, it writes into the residual stream by addition. Using simple block matrix multiplication, you can decompose the stream into additive contributions from individual attention heads, MLP neurons, and bias terms.
This makes the residual stream a natural object for circuit analysis. Every component leaves a traceable, additive footprint.
Full derivation in the post below.
adityaiyer7.github.io/blogs/โฆ#MechanisticInterpretability#AIInterpretability#AIAlignment#TransformerCircuits#Transformers