CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025

21 Photos and videos

Tweets

Pinned Tweet

CogInterp Workshop @ NeurIPS 2025 @CogInterp

11 Jul 2025

We’re excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣 How can we interpret the algorithms and representations underlying complex behavior in deep learning models? 🌐 coginterp.github.io/neurips2… 1/

17,396

NYU Center for Data Science

CogInterp Workshop @ NeurIPS 2025 retweeted

NYU Center for Data Science

@NYUDataScience

Jan 27

Can LLMs evolve human-like semantic categories? CDS-affiliated @NogaZaslavsky and PhD student Nathaniel Imel show that, via simulated cultural transmission, LLMs reorganize color categories toward efficient compression. 🔗arxiv.org/abs/2509.08093

9,136

Ari Holtzman

CogInterp Workshop @ NeurIPS 2025 retweeted

Ari Holtzman

@universeinanegg

21 Dec 2025

this slide is solid gold

Goodfire

@GoodfireAI

11 Dec 2025

Our last Stanford guest lecture - @EkdeepL on what counts as an explanation & a neuro-inspired "model systems approach" to interp Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach) 00:33 - What counts as an explanation? 04:47 - Levels of analysis & standard interpretability approaches 18:19 - The "model systems" approach to interp [Case study on in-context learning] 23:36 - How LLM representations change in-context 44:10 - Modeling ICL with rational analysis 1:10:54 - Conclusion & questions Thanks again to @SuryaGanguli for having us in his class!

1:18:27

6,371

Goodfire

CogInterp Workshop @ NeurIPS 2025 retweeted

Goodfire

@GoodfireAI

11 Dec 2025

1:18:27

137

31,488

Christopher Potts

CogInterp Workshop @ NeurIPS 2025 retweeted

Christopher Potts

@ChrisGPotts

10 Dec 2025

Safety-oriented interpretability researchers should be focused on AI systems, not individual model artifacts. A snippet from the NeurIPS CogInterp workshop panel on Sunday:

0:37

166

16,355

Noga Zaslavsky

CogInterp Workshop @ NeurIPS 2025 retweeted

Noga Zaslavsky @NogaZaslavsky

8 Dec 2025

Honored and thrilled that our work received the @CogInterp best paper award! 💫 📄 Extended paper: arxiv.org/pdf/2509.08093 🧵 Highlights: x.com/NogaZaslavsky/status/1… @NeurIPSConf #NeurIPS2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

8 Dec 2025

Our Best Paper Award goes to Nathaniel Imel and Noga Zaslavsky @NogaZaslavsky for their excellent paper “Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression”!

4,234

Ari Holtzman

CogInterp Workshop @ NeurIPS 2025 retweeted

Ari Holtzman

@universeinanegg

8 Dec 2025

this was so awesome. Jay still killin' it five decades later

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

Jay McClelland, opens with a question, "Do LMs have thoughts?" Are LMs stochastic parrots or is there some understanding?

7,274

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

8 Dec 2025

4,922

Justin Angel

CogInterp Workshop @ NeurIPS 2025 retweeted

Justin Angel

@JustinAngel

7 Dec 2025

At the @CogInterp workshop at NeurIPS. coginterp.github.io/neurips2… This slide explains MechIntrep vs CongIntrep:

649

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

8 Dec 2025

We are about to start our panel discussion, join us for some hot takes about what cognitive interpretability should be about.

343

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

Our final speaker @sydneymlevine makes a radical proposal: building computational models of human moral judgements to use as an AI system for making moral judgements.

208

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

Jay McClelland, opens with a question, "Do LMs have thoughts?" Are LMs stochastic parrots or is there some understanding?

7,954

more replies

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

Visualizing how LLMs handle object-property binding, he argues that even with scale, transformers might not be forming the kind of 'integrated representations' that human cognition relies on.

274

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

Jay proposes shifting from representing context as a sequence of tokens to a sequence of thoughts. The model learns a latent 'thought gestalt' from previous sentences to guide downstream prediction.

258

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

A big crowd for Jay McClelland’s talk!

195

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

Swing by a super happening poster session where ML and CogSci meet!

2,960

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

In our fourth spotlight talk, neural network legend Paul Smolensky uses symbolic programs such as production systems to understand how neural networks process symbols

2,806

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

For our third spotlight talk, Sonia Murthy @soniakmurthy uses probabilistic cognitive models to understand value trade-offs in LLMs that enable pragmatic reasoning about politeness in speech acts

166

CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

Erin Grant @ermgrant discusses dissociations between function and representation, and asks whether representational alignment is enough for understanding deep neural networks

455

Sonia Murthy

CogInterp Workshop @ NeurIPS 2025 retweeted

Sonia Murthy @soniakmurthy

7 Dec 2025

Excited to be presenting our work on using cognitive models to interpret pluralistic values in LLMs once again as a spotlight talk 🌟 at the NeurIPS CogInterp workshop! Come by upper level room 5AB today and check out the paper here: arxiv.org/abs/2506.20666

Cognitive models can reveal interpretable value trade-offs in...

Value trade-offs are an integral part of human decision-making and language use, however, current tools for interpreting such dynamic and multi-faceted notions of values in language models are...

arxiv.org

CogInterp Workshop @ NeurIPS 2025 @CogInterp

7 Dec 2025

Replying to @CogInterp

The spotlight talks will cover all aspects of interpreting cognition in deep learning models: from behavior to algorithms to representations! Also check out the list of poster presentations at coginterp.github.io/neurips2… (3/3)

1,003