Hadas Orgad

Hadas Orgad

11 Photos and videos

Tweets

Actionable Interpretability Workshop ICML2025 retweeted

Hadas Orgad @OrgadHadas

May 23

Submit your work! The 2nd Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at COLM 2026 in San Francisco! Submission Deadline: June 21, 2026 @ActInterp

132

13,888

Actionable Interpretability Workshop ICML2025

Actionable Interpretability Workshop ICML2025 @ActInterp

Feb 18

A very exciting outcome of the workshop!

Hadas Orgad @OrgadHadas

Feb 18

Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How? We're ready to answer. 🧵

439

Adi Simhi

Actionable Interpretability Workshop ICML2025 retweeted

Adi Simhi @AdiSimhi

8 Oct 2025

🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? 🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵

4,196

Yonatan Belinkov

Actionable Interpretability Workshop ICML2025 retweeted

Yonatan Belinkov @boknilev

1 Oct 2025

Opportunities to join my group in fall 2026: * PhD applications direct or via @ELLISforEurope (ellis.eu/news/ellis-phd-prog…) * Post-doc applications direct or via Azrieli @azrielifdn (azrielifoundation.org/fellow…) or Zuckerman @stem_program (zuckermanstem.org/ourprogram…)

328

42,404

Ivan Titov

Actionable Interpretability Workshop ICML2025 retweeted

Ivan Titov @iatitov

21 Jul 2025

Many thanks to the @ActInterp organisers for highlighting our work - and congratulations to Pedro, Alex and the other awardees! Sad not to have been there in person, it looked like a fantastic workshop. @AmsterdamNLP @EdinburghNLP

Actionable Interpretability Workshop ICML2025 @ActInterp

20 Jul 2025

Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇

2,746

Actionable Interpretability Workshop ICML2025

Actionable Interpretability Workshop ICML2025 @ActInterp

20 Jul 2025

Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇

6,075

Actionable Interpretability Workshop ICML2025

Actionable Interpretability Workshop ICML2025 @ActInterp

20 Jul 2025

1⃣Detecting High-Stakes Interactions with Activation Probes - arxiv.org/abs/2506.10805 2⃣ Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations - arxiv.org/abs/2504.05294

403

NDIF

Actionable Interpretability Workshop ICML2025 retweeted

NDIF @ndif_team

19 Jul 2025

Great to present what’s coming next for NDIF at the @actinterp workshop at #ICML2025! If you missed us, let’s chat after the conference. Reach out here: forms.gle/AhTSBNNttA11JVNS6

1,749