🚨 New preprint available! 🚨
We test how much of an LLM's internal semantic geometry can be recovered from behavior alone. Across 8 LLMs and 17.5M trials, forced-choice tasks align with hidden-state structure much better than free association.
Preprint: arxiv.org/pdf/2602.00628
The agentic researcher: A practical guide to AI-assisted research in mathematics and machine learning. ~ Max Zimmer, Nico Pelleriti, Christophe Roux, Sebastian Pokutta. arxiv.org/abs/2603.15914v1#AI4Math
Super excited that we got this done together. Psychology x LLMs is one of the next frontiers - excited to see what else we unlock as we move forward.
#ml#ai#icml2026
Free association has dominated semantic memory research since Galton in 1879. Our results suggest it may not be the highest-fidelity instrument we have.
Thanks to co-authors @maxzimmerberlin , @chrisrx13 , @spokutta, Fritz Günther 🙏
New paper, accepted at ICML 2026 🎉
A 145-year-old psychology paradigm may not be the best way to probe semantic memory. We show this by running classic behavioral experiments on 8 LLMs and comparing results to their hidden states over 17.5M trials.
📄 arxiv.org/pdf/2602.00628
Free association has dominated semantic memory research since Galton in 1879. Our results suggest it may not be the highest-fidelity instrument we have.
Thanks to co-authors @maxzimmerberlin , @chrisrx13 , @spokutta, Fritz Günther 🙏
Three papers accepted at #ICML26!
- When Does Sparsity Mitigate the Curse of Depth in LLMs
- From Associations to Activations: Comparing Behavioral and Hidden-State Semantic Geometry in LLMs
- Lower Bounds for Frank-Wolfe on Strongly Convex Sets
arXiv below, see you in Korea 🌞
🚨 New preprint available! 🚨
We test how much of an LLM's internal semantic geometry can be recovered from behavior alone. Across 8 LLMs and 17.5M trials, forced-choice tasks align with hidden-state structure much better than free association.
Preprint: arxiv.org/pdf/2602.00628
Implication: Forced choice concentrates evidence (shared candidate sets), so its behavior-derived similarity better predicts unseen hidden-state similarities even without logit access. This makes forced choice a practical probe for representation analysis.
Glad to be part of this initiative. PsychLing-101 will make it easier to connect and use psycholinguistic datasets across projects.
If you’re working with language processing data — join us!
🚨 Inviting collaborators! 🚨
We’re launching PsychLing-101 — an open, community-driven initiative to gather psycholinguistic datasets for cross-dataset analyses and the development of psycholinguistic foundation models.
👉 To contribute, go to github.com/Data-X01/PsychLin…
Excited to present our #ESCOP2025 poster on Friday 12:30 in Sheffield:
"Modeling Lexical Competition in Language Production: A Computational Approach to the Swinging Lexical Network".
Come chat about spreading activation, picture–word interference & computational modeling!
Our study "Does Scientific Productivity Increase the Publication of Positive Results?" is now published in Collabra: Psychology! Link: doi.org/10.1525/collabra.137… (1/4)
New exploratory analyses (thanks to reviewer suggestions):
Even when analyzing ~2,000 abstracts across all SP quartiles, we still find no evidence that SP explains differences in positive result prevalence. (3/4).