๐Ÿ› ๏ธ Actionable Interpretability๐Ÿ”Ž @icmlconf 2025 | Bridging the gap between insights and actions โœจ actionable-interpretability.โ€ฆ

Joined March 2025
11 Photos and videos
Actionable Interpretability Workshop ICML2025 retweeted
Submit your work! The 2nd Workshop on ๐€๐œ๐ญ๐ข๐จ๐ง๐š๐›๐ฅ๐ž ๐ˆ๐ง๐ญ๐ž๐ซ๐ฉ๐ซ๐ž๐ญ๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ will be held at COLM 2026 in San Francisco! Submission Deadline: June 21, 2026 @ActInterp
2
18
132
13,888
A very exciting outcome of the workshop!
Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How? We're ready to answer. ๐Ÿงต
5
439
Actionable Interpretability Workshop ICML2025 retweeted
8 Oct 2025
๐Ÿค”What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? ๐Ÿš€ New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs๐Ÿš€๐Ÿงต
1
17
36
4,196
Actionable Interpretability Workshop ICML2025 retweeted
Opportunities to join my group in fall 2026: * PhD applications direct or via @ELLISforEurope (ellis.eu/news/ellis-phd-progโ€ฆ) * Post-doc applications direct or via Azrieli @azrielifdn (azrielifoundation.org/fellowโ€ฆ) or Zuckerman @stem_program (zuckermanstem.org/ourprogramโ€ฆ)
7
49
328
42,404
Actionable Interpretability Workshop ICML2025 retweeted
21 Jul 2025
Many thanks to the @ActInterp organisers for highlighting our work - and congratulations to Pedro, Alex and the other awardees! Sad not to have been there in person, it looked like a fantastic workshop. @AmsterdamNLP @EdinburghNLP
Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!๐Ÿ‘๐Ÿ‘ and thanks for the fantastic oral presentations! Check out the papers here ๐Ÿ‘‡
3
28
2,746
Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!๐Ÿ‘๐Ÿ‘ and thanks for the fantastic oral presentations! Check out the papers here ๐Ÿ‘‡
1
3
16
6,075
1โƒฃDetecting High-Stakes Interactions with Activation Probes - arxiv.org/abs/2506.10805 2โƒฃ Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations - arxiv.org/abs/2504.05294
1
403
Actionable Interpretability Workshop ICML2025 retweeted
19 Jul 2025
Great to present whatโ€™s coming next for NDIF at the @actinterp workshop at #ICML2025! If you missed us, letโ€™s chat after the conference. Reach out here: forms.gle/AhTSBNNttA11JVNS6
3
39
1,749
Starting now: our panel on actionable interpretability! @nsaphra @saprmarks @kylelostat @FazlBarez
1
2
18
1,443
๐Ÿ‘‡๐Ÿป
maybe I will live tweet the actionable interp workshop panel
1
543
Actionable Interpretability Workshop ICML2025 retweeted
maybe I will live tweet the actionable interp workshop panel
11
8
101
13,002
Huge thanks to Sarah Schwettmann for a fascinating keynote on "AI Investigators for Understanding AI Systems" ๐Ÿค– @cogconfluence @TransluceAI
1
4
31
5,294
Grab a โ˜•๏ธ and join us for a keynote by @RICEric22: Explanations for Experts via Guarantees and Domain Knowledge: From Attributions to Reasoning
4
14
976
โžก๏ธ Join us for the keynote by @byron_c_wallace: โ€œWhat (if anything) can interpretability do for healthcare?โ€
2
2
13
1,039
The second poster session is starting now!๐Ÿ™Œ๐Ÿป
1
7
766
Actionable Interpretability Workshop ICML2025 retweeted
Come see our poster about how to predict side effects of unlearning and Fine-Tuning at @ActInterp
1
3
25
1,972
Actionable Interpretability Workshop ICML2025 retweeted
Crazy amount of cool work concentrated in one room
The first poster session is happening now!
4
15
1,566
The first poster session is happening now!
2
10
4,268
The one and only @_beenkim on Agentic Interpretability and Neologism: What LLMs Can Offer Us!
4
34
3,285
Weโ€™ve started!๐Ÿ‘ Looking forward to an exciting day!๐Ÿ’ซ๐Ÿ”โš™๏ธ
3
23
1,306