At Upper Level Room 5AB of the conference venue!

Joined July 2025
21 Photos and videos
We’re excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣 How can we interpret the algorithms and representations underlying complex behavior in deep learning models? 🌐 coginterp.github.io/neurips2… 1/

1
20
75
17,396
CogInterp Workshop @ NeurIPS 2025 retweeted
Can LLMs evolve human-like semantic categories? CDS-affiliated @NogaZaslavsky and PhD student Nathaniel Imel show that, via simulated cultural transmission, LLMs reorganize color categories toward efficient compression. 🔗arxiv.org/abs/2509.08093
2
4
29
9,136
CogInterp Workshop @ NeurIPS 2025 retweeted
this slide is solid gold
11 Dec 2025
Our last Stanford guest lecture - @EkdeepL on what counts as an explanation & a neuro-inspired "model systems approach" to interp Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach) 00:33 - What counts as an explanation? 04:47 - Levels of analysis & standard interpretability approaches 18:19 - The "model systems" approach to interp [Case study on in-context learning] 23:36 - How LLM representations change in-context 44:10 - Modeling ICL with rational analysis 1:10:54 - Conclusion & questions Thanks again to @SuryaGanguli for having us in his class!
2
4
52
6,371
CogInterp Workshop @ NeurIPS 2025 retweeted
11 Dec 2025
Our last Stanford guest lecture - @EkdeepL on what counts as an explanation & a neuro-inspired "model systems approach" to interp Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach) 00:33 - What counts as an explanation? 04:47 - Levels of analysis & standard interpretability approaches 18:19 - The "model systems" approach to interp [Case study on in-context learning] 23:36 - How LLM representations change in-context 44:10 - Modeling ICL with rational analysis 1:10:54 - Conclusion & questions Thanks again to @SuryaGanguli for having us in his class!
3
27
137
31,488
CogInterp Workshop @ NeurIPS 2025 retweeted
Safety-oriented interpretability researchers should be focused on AI systems, not individual model artifacts. A snippet from the NeurIPS CogInterp workshop panel on Sunday:
6
18
166
16,355
CogInterp Workshop @ NeurIPS 2025 retweeted
Honored and thrilled that our work received the @CogInterp best paper award! 💫 📄 Extended paper: arxiv.org/pdf/2509.08093 🧵 Highlights: x.com/NogaZaslavsky/status/1… @NeurIPSConf #NeurIPS2025

Our Best Paper Award goes to Nathaniel Imel and Noga Zaslavsky @NogaZaslavsky for their excellent paper “Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression”!
2
6
35
4,234
CogInterp Workshop @ NeurIPS 2025 retweeted
this was so awesome. Jay still killin' it five decades later
Jay McClelland, opens with a question, "Do LMs have thoughts?" Are LMs stochastic parrots or is there some understanding?
3
1
39
7,274
Our Best Paper Award goes to Nathaniel Imel and Noga Zaslavsky @NogaZaslavsky for their excellent paper “Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression”!
1
11
4,922
CogInterp Workshop @ NeurIPS 2025 retweeted
At the @CogInterp workshop at NeurIPS. coginterp.github.io/neurips2… This slide explains MechIntrep vs CongIntrep:
4
11
649
We are about to start our panel discussion, join us for some hot takes about what cognitive interpretability should be about.
1
7
343
Our final speaker @sydneymlevine makes a radical proposal: building computational models of human moral judgements to use as an AI system for making moral judgements.
3
208
Jay McClelland, opens with a question, "Do LMs have thoughts?" Are LMs stochastic parrots or is there some understanding?
3
1
18
7,954
Visualizing how LLMs handle object-property binding, he argues that even with scale, transformers might not be forming the kind of 'integrated representations' that human cognition relies on.
1
1
274
Jay proposes shifting from representing context as a sequence of tokens to a sequence of thoughts. The model learns a latent 'thought gestalt' from previous sentences to guide downstream prediction.
4
258
A big crowd for Jay McClelland’s talk!
3
195
Swing by a super happening poster session where ML and CogSci meet!
1
6
2,960
In our fourth spotlight talk, neural network legend Paul Smolensky uses symbolic programs such as production systems to understand how neural networks process symbols
3
21
2,806
For our third spotlight talk, Sonia Murthy @soniakmurthy uses probabilistic cognitive models to understand value trade-offs in LLMs that enable pragmatic reasoning about politeness in speech acts
3
166
Erin Grant @ermgrant discusses dissociations between function and representation, and asks whether representational alignment is enough for understanding deep neural networks
1
1
10
455
CogInterp Workshop @ NeurIPS 2025 retweeted
Excited to be presenting our work on using cognitive models to interpret pluralistic values in LLMs once again as a spotlight talk 🌟 at the NeurIPS CogInterp workshop! Come by upper level room 5AB today and check out the paper here: arxiv.org/abs/2506.20666
Replying to @CogInterp
The spotlight talks will cover all aspects of interpreting cognition in deep learning models: from behavior to algorithms to representations! Also check out the list of poster presentations at coginterp.github.io/neurips2… (3/3)
2
8
1,003