Our last Stanford guest lecture -
@EkdeepL on what counts as an explanation & a neuro-inspired "model systems approach" to interp
Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach)
00:33 - What counts as an explanation?
04:47 - Levels of analysis & standard interpretability approaches
18:19 - The "model systems" approach to interp
[Case study on in-context learning]
23:36 - How LLM representations change in-context
44:10 - Modeling ICL with rational analysis
1:10:54 - Conclusion & questions
Thanks again to
@SuryaGanguli for having us in his class!