Understandable Machine Intelligence Lab: We bring #explainable #AI to the next level. Part of @LeibnizATB, Ex @TUBerlin, funded by @BMBF_Bund #XAI

Joined November 2020
59 Photos and videos
🔊 Not to miss …. last month @anna_hedstroem defended her PhD “Evaluation-centric advances in neural model interpretability” at TU Berlin — with distinction! ✨🧠💻☕️ Here’s a thread of a selection of Anna’s evaluation-centric interpretability work what comes next. 🧵
1
3
984
📍 Now @anna_hedstroem is a Postdoctoral Fellow at the @ETH_AI_Center, working with the @ivia_lab and Learning & Adaptive Systems (LAS) group. Anna's focus ahead: evaluation-centric interpretability, LLM steering, and AI safety. ✨🧠💻☕️ More info: annahedstroem.com/.

1
78
Understandable Machine Intelligence Lab retweeted
19 Sep 2025
Happy to share that our PRISM paper has been accepted at #NeurIPS2025 🎉 In this work, we introduce a multi-concept feature description framework that can identify and score polysemantic features. 📄 Paper: arxiv.org/abs/2506.15538 #NeurIPS #MechInterp #XAI
1
5
9
1,075
🎉 Huge congratulations to @kirill_bykov, the very first PhD student of our lab, who successfully defended his thesis “Explaining Representations in Deep Neural Networks” this Monday with summa cum laude! 👏 🧵 In the next tweets, we’ll highlight some of his key works:
1
1
12
726
📚 During his PhD, Kirill co-authored 11 papers spanning interpretability, neuron analysis & robust explanations. You can find all of them on his Google Scholar: 👉 scholar.google.com/citations… Once again, congrats @kirill_bykov on an outstanding PhD journey! 🎓✨
4
121
Our latest paper is out! 🚀
19 Jun 2025
🔍 When do neurons encode multiple concepts? We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity. 📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework arxiv.org/abs/2506.15538 🧵
4
201
Understandable Machine Intelligence Lab retweeted
If you're at #AAAI2025 don't miss our poster today (alignment track)! Paper 📘: arxiv.org/pdf/2502.15403 Code 👩‍💻: github.com/annahedstroem/eva… Team work with @eirasf and @Marina_MCV

27 Feb 2025
At 12:30 I'll be happy to take questions about our poster presentation at #AAAI2025. Is your explanation for a model's prediction better than the alternatives? "Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution" introduces QGE... 1/4
2
2
514