Joined August 2015
8 Photos and videos
Pinned Tweet
🚨Paper Alert 🚨 Excited to release the Triplet adversarial defense at #EMNLP2025, a new approach extending circuit breaking (2024, by @andyzou_jiaming, @hendrycks et al.) which achieves < 5% ASR on embedding-level attacks! #AISafety 🔗 Link: arxiv.org/abs/2506.11938 🧵(1/6)
1
1
20
7,624
Samuel Simko retweeted
If you are a PhD student or early-career researcher, apply before the deadline: June 21, 2026. Aug 31 – Sep 11 · Preliminary speaker list and applications: lnkd.in/ekrPt4m9Amazingly, this is the 50th MLSS, and it coincides with the 25th anniversary of our lab at Max Planck (2/3).

1
24
223
54,260
Samuel Simko retweeted
Open-weight LLMs ship with safety training that can be stripped in a few hundred fine-tuning steps. Can current defenses stop this? We built and open-sourced TamperBench, the first unified framework for evaluating tamper resistance, and the answer is mostly no. 1/7
1
14
28
4,409
I’m in the Bay Area for the rest of the week 👋 Would love to connect with AI safety researchers while I’m here. Also curious about any events happening this week.
1
7
981
I’ll be speaking at the AIxBio event in Zurich on May 13th at 18:00! Join for a discussion of what current AI systems can (or cannot) do and how their risks can be reduced. Registration link: luma.com/a3nyvkkp
9
391
Samuel Simko retweeted
Excited for our #ICML2026 papers at @JinesisLab @MPI_IS @UofTCompSci @TorontoSRI @VectorInst! We present papers that advance the research frontiers of (1) Causal LLMs, (2) AI for Science (physics), (3) Multi-Agent LLMs via mechanism design, and (4) Adversarial Defense by honeypot. Congrats to all our student authors and collaborators, esp. @TerryJCZhang @SimkoSamuel @EmanuelTewolde @ivakshi_s @andrewkihyun @PepijnCobben @yahang_qi @FurkanDanismann @bschoelkopf and many others!🎉
11
76
3,884
Stage I for MARS V closes Sunday, 3 May at 23:59 AoE!
[Call for applicants] My supervisor @ZhijingJin (UofT, CIFAR AI Chair) and I will be mentoring a project for MARS V, a part-time research programme for AI safety research. MARS provides a one-week in-person kick-off in the UK, compute, and research management support! 🚀 The projects are: 🛡️ Adversarial defenses for LLMs using causal methods 🌐 Evaluating risks from AI-assisted authoritarianism 👉 Apply by May 3rd. Applications are reviewed on a rolling basis: caish.org/mars @CambridgeAISafetyHub
11
1,741
Paper accepted ✅ See you in Seoul! 👋🇰🇷 #ICML
2
3
66
2,399
[Call for applicants] My supervisor @ZhijingJin (UofT, CIFAR AI Chair) and I will be mentoring a project for MARS V, a part-time research programme for AI safety research. MARS provides a one-week in-person kick-off in the UK, compute, and research management support! 🚀 The projects are: 🛡️ Adversarial defenses for LLMs using causal methods 🌐 Evaluating risks from AI-assisted authoritarianism 👉 Apply by May 3rd. Applications are reviewed on a rolling basis: caish.org/mars @CambridgeAISafetyHub
2
9
92
17,976
Samuel Simko retweeted
We'll be organizing the Machine Learning Summer School in Tübingen to be held Aug 31st-Sept 11th, featuring top speakers across academia and industry. If you are a student or ML researcher, save those dates and stay tuned for updates! 🚀
13
19
267
17,966
Samuel Simko retweeted
Excited for our "Trustworthy AI for Good" (AI4GOOD) Workshop at #ICML2026! As AI agents increasingly affect our lives, it is key to bridge #ResponsibleAI, social good, and governance. Let’s build solutions together! ⏰ Submission deadline: April 30, 2026 (AoE) 🎙️Confirmed speakers: @Yoshua_Bengio, Joel Z. Leibo (@jzl86), Maksym Andriushchenko (@maksym_andr), @OanaIgnatRo [More to come!] 📍July 10-11, 2026 · Seoul🇰🇷 🔗 trustworthy-ai-for-good.gith… 📝 Submit: openreview.net/group?id=ICML… 📣 Be a reviewer: forms.gle/7cXvUJCW1FdEghi6A
3
30
164
13,598
Samuel Simko retweeted
What if the most dangerous AI isn’t rogue - but works as intended? A new ELLIS-affiliated paper shows aligned, policy-compliant AI can still undermine democracy at scale. Bottom line: alignment ≠ safety. Democratic resilience must keep pace. 📄 Paper: bit.ly/4snKdLN
1
11
941
Samuel Simko retweeted
📢We will present 5 papers to #ICLR2026, #CLeaR2026, and #ACL2026: - SocialHarmBench by @psyonp et al. - Causal LLMs on Instrumental Variable Method by @ivakshi_s et al. - LLM Data Contamination study by @TerryJCZhang et al. - Mech Interp for VLM by @francescortu et al. - DPO data selection method by Xuan & @rongwu_xu Thanks to all our collaborators and institutional support from @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue @ETH_en @ETH_AI_Center @michigan_AI @UMichiganAI @UMichCSE! Feel free to access the papers at arxiv.org/abs/2510.04891 arxiv.org/abs/2602.07943 arxiv.org/abs/2509.00072 arxiv.org/abs/2507.13868 arxiv.org/abs/2508.04149 🎉
1
10
77
5,804
Samuel Simko retweeted
Our work on multilingual benchmark translation got accepted to @aclmeeting 2026 Findings! 🎉 The glorious EEU promo is coming to San Diego this summer🥺🇺🇦🇧🇬 #ACL #ACL2026 #ACL2026NLP #NLProc
❓Can we actually trust the quality of the existing multilingual benchmarks translated from English? Turns out many of them have some simple bugs, which hurts the evaluations - we try to fix that! Introducing Recovered in Translation 🌍 ritranslation.insait.ai 🧵below
1
3
14
956
🚨 New paper: AI Poses Risks to Democratic and Social Systems. We discuss 7 failure modes showing how Al can degrade democracy and society through power concentration, narrowing how we think, or flooding institutions faster than they can keep up. We also proposed 7 research & governance recommendations, from simulation-based stress-testing to deliberative governance infrastructure. Honored to work with Yoshua Bengio, Stuart Russell, Roger Grosse, Bernhard Schölkopf, Rada Mihalcea, Ashton Anderson, Audrey Tang and many others. Full whitepaper here: zhijing-jin.com/d/2026-ai-ri…

AI is threatening our democratic society—by concentrating power, narrowing how we think, and flooding institutions faster than they can keep up. These risks emerge at the system level, and technical work alone won't fix them. 👉Check out our whitepaper with 25 researchers: zhijing-jin.com/d/2026-ai-ri… 💡We introduce 7 threat models and ways forward. ✍️Led by @davidguzman1120 with @DaveRBanerjee, @blin_kevin, @PepijnCobben, @gcorsi_, @x_angelohuang, @ChanglingXavier, Suvajit Majumder, @psyonp, @SimkoSamuel, @strauss_irene, and @TerryJCZhang Advised by senior co-authors: @ashton1anderson, @Yoshua_Bengio, @MatthiasBethge, @RogerGrosse, Karoline Helbig, @david_lie, Richard Mallah, @radamihalcea, Susan Nesbitt, Susan Perry, @presnick, Stuart Russell, @mrinmayasachan, @bschoelkopf @audreyt and @ZhijingJin Thank you to all the institutional support from @JinesisLab @EuroSafeAI @MPI_IS @CIFAR_News @iapsAI @CARMA_411 @Cambridge_Uni @UofTCompSci @VectorInst @TorontoSRI @Mila_Quebec @LawZero_ @uni_tue @michigan_AI @UMichCSE @AUParis @UNESCO @UCBerkeley @ETH_en @ETH_AI_Center @ELLISInst_Tue @ELLISforEurope @EthicsInAI #CivicAI #AISafety #AIGovernance #Democracy #ResponsibleAI
2
9
732
Samuel Simko retweeted
Mech interp or representation interp? We need to decode the causal computational graph of #LLMs—not just cataloguing representations (steering vectors etc). Analogy: we can’t understand biology by just blood composition. We need to understand how the body works. Same for LLMs.
4
24
165
9,892
🚀 We're launching EuroSafeAI, a nonprofit focused on multi-agent AI safety research, here in Zurich. Our launch event is on 6 March at 6:30 PM at the ETH Student Project House. Expect lightning talks and drinks! Info & Sign-up: luma.com/hwo46ach See you there! 🔥
1
3
11
865
🗓️ 6 March 2026 · 6:30 PM 📍 ETH Student Project House E floor, Clausiusstrasse 16 eurosafe.ai.toronto.edu/
1
2
99
Speakers: @ZhijingJin (CIFAR AI Chair; Professor@UoT, Chief Scientist), @x_angelohuang (Co-founder and Director), @pepijncobben (Co-founder and Director), @SimkoSamuel (Technical Staff), @davidguzman1120 (Technical Staff).
1
80