Samuel Simko

Samuel Simko

8 Photos and videos

Tweets

Pinned Tweet

Samuel Simko

@SimkoSamuel

28 Sep 2025

🚨Paper Alert 🚨 Excited to release the Triplet adversarial defense at #EMNLP2025, a new approach extending circuit breaking (2024, by @andyzou_jiaming, @hendrycks et al.) which achieves < 5% ASR on embedding-level attacks! #AISafety 🔗 Link: arxiv.org/abs/2506.11938 🧵(1/6)

7,624

Bernhard Schölkopf

Samuel Simko retweeted

Bernhard Schölkopf @bschoelkopf

Jun 12

If you are a PhD student or early-career researcher, apply before the deadline: June 21, 2026. Aug 31 – Sep 11 · Preliminary speaker list and applications: lnkd.in/ekrPt4m9Amazingly, this is the 50th MLSS, and it coincides with the 25th anniversary of our lab at Max Planck (2/3).

223

54,260

FAR.AI

Samuel Simko retweeted

FAR.AI

@farairesearch

Jun 4

Open-weight LLMs ship with safety training that can be stripped in a few hundred fine-tuning steps. Can current defenses stop this? We built and open-sourced TamperBench, the first unified framework for evaluating tamper resistance, and the answer is mostly no. 1/7

4,409

Samuel Simko

Samuel Simko

@SimkoSamuel

Jun 2

I’m in the Bay Area for the rest of the week 👋 Would love to connect with AI safety researchers while I’m here. Also curious about any events happening this week.

981

Samuel Simko

Samuel Simko

@SimkoSamuel

May 10

I’ll be speaking at the AIxBio event in Zurich on May 13th at 18:00! Join for a discussion of what current AI systems can (or cannot) do and how their risks can be reduced. Registration link: luma.com/a3nyvkkp

Discussion session: AIxBio · Zoom · Luma

Advances in AI and synthetic biology are rapidly changing the landscape of biosecurity risks, including the potential for engineered pandemics. In this…

luma.com

391

Zhijing Jin

Samuel Simko retweeted

Zhijing Jin

@ZhijingJin

May 3

Excited for our #ICML2026 papers at @JinesisLab @MPI_IS @UofTCompSci @TorontoSRI @VectorInst! We present papers that advance the research frontiers of (1) Causal LLMs, (2) AI for Science (physics), (3) Multi-Agent LLMs via mechanism design, and (4) Adversarial Defense by honeypot. Congrats to all our student authors and collaborators, esp. @TerryJCZhang @SimkoSamuel @EmanuelTewolde @ivakshi_s @andrewkihyun @PepijnCobben @yahang_qi @FurkanDanismann @bschoelkopf and many others!🎉

3,884

Samuel Simko

Samuel Simko

@SimkoSamuel

May 2

Stage I for MARS V closes Sunday, 3 May at 23:59 AoE!

Samuel Simko

@SimkoSamuel

Apr 27

[Call for applicants] My supervisor @ZhijingJin (UofT, CIFAR AI Chair) and I will be mentoring a project for MARS V, a part-time research programme for AI safety research. MARS provides a one-week in-person kick-off in the UK, compute, and research management support! 🚀 The projects are: 🛡️ Adversarial defenses for LLMs using causal methods 🌐 Evaluating risks from AI-assisted authoritarianism 👉 Apply by May 3rd. Applications are reviewed on a rolling basis: caish.org/mars @CambridgeAISafetyHub

1,741

Samuel Simko

Samuel Simko

@SimkoSamuel

May 1

Paper accepted ✅ See you in Seoul! 👋🇰🇷 #ICML

2,399

Samuel Simko

Samuel Simko

@SimkoSamuel

Apr 27

17,976

Lancelot Da Costa

Samuel Simko retweeted

Lancelot Da Costa @lancelotdacosta

Apr 20

We'll be organizing the Machine Learning Summer School in Tübingen to be held Aug 31st-Sept 11th, featuring top speakers across academia and industry. If you are a student or ML researcher, save those dates and stay tuned for updates! 🚀

267

17,966

Zhijing Jin

Samuel Simko retweeted

Zhijing Jin

@ZhijingJin

Apr 9

Excited for our "Trustworthy AI for Good" (AI4GOOD) Workshop at #ICML2026! As AI agents increasingly affect our lives, it is key to bridge #ResponsibleAI, social good, and governance. Let’s build solutions together! ⏰ Submission deadline: April 30, 2026 (AoE) 🎙️Confirmed speakers: @Yoshua_Bengio, Joel Z. Leibo (@jzl86), Maksym Andriushchenko (@maksym_andr), @OanaIgnatRo [More to come!] 📍July 10-11, 2026 · Seoul🇰🇷 🔗 trustworthy-ai-for-good.gith… 📝 Submit: openreview.net/group?id=ICML… 📣 Be a reviewer: forms.gle/7cXvUJCW1FdEghi6A

164

13,598

ELLIS

Samuel Simko retweeted

ELLIS @ELLISforEurope

Apr 8

What if the most dangerous AI isn’t rogue - but works as intended? A new ELLIS-affiliated paper shows aligned, policy-compliant AI can still undermine democracy at scale. Bottom line: alignment ≠ safety. Democratic resilience must keep pace. 📄 Paper: bit.ly/4snKdLN

A recent paper, co-authored by several ELLIS-affiliated researchers, shows that perfectly aligned, policy-compliant AI can undermine democratic institutions. Not by malice, but through sheer scale.

ALT A recent paper, co-authored by several ELLIS-affiliated researchers, shows that perfectly aligned, policy-compliant AI can undermine democratic institutions. Not by malice, but through sheer scale.

941

Zhijing Jin

Samuel Simko retweeted

Zhijing Jin

@ZhijingJin

Apr 6

📢We will present 5 papers to #ICLR2026, #CLeaR2026, and #ACL2026: - SocialHarmBench by @psyonp et al. - Causal LLMs on Instrumental Variable Method by @ivakshi_s et al. - LLM Data Contamination study by @TerryJCZhang et al. - Mech Interp for VLM by @francescortu et al. - DPO data selection method by Xuan & @rongwu_xu Thanks to all our collaborators and institutional support from @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue @ETH_en @ETH_AI_Center @michigan_AI @UMichiganAI @UMichCSE! Feel free to access the papers at arxiv.org/abs/2510.04891 arxiv.org/abs/2602.07943 arxiv.org/abs/2509.00072 arxiv.org/abs/2507.13868 arxiv.org/abs/2508.04149 🎉

5,804

Hanna Yukhymenko

Samuel Simko retweeted

Hanna Yukhymenko

@a_yukh

Apr 6

Our work on multilingual benchmark translation got accepted to @aclmeeting 2026 Findings! 🎉 The glorious EEU promo is coming to San Diego this summer🥺🇺🇦🇧🇬 #ACL #ACL2026 #ACL2026NLP #NLProc

Hanna Yukhymenko

@a_yukh

Feb 26

❓Can we actually trust the quality of the existing multilingual benchmarks translated from English? Turns out many of them have some simple bugs, which hurts the evaluations - we try to fix that! Introducing Recovered in Translation 🌍 ritranslation.insait.ai 🧵below

956

Samuel Simko

Samuel Simko

@SimkoSamuel

Mar 25

🚨 New paper: AI Poses Risks to Democratic and Social Systems. We discuss 7 failure modes showing how Al can degrade democracy and society through power concentration, narrowing how we think, or flooding institutions faster than they can keep up. We also proposed 7 research & governance recommendations, from simulation-based stress-testing to deliberative governance infrastructure. Honored to work with Yoshua Bengio, Stuart Russell, Roger Grosse, Bernhard Schölkopf, Rada Mihalcea, Ashton Anderson, Audrey Tang and many others. Full whitepaper here: zhijing-jin.com/d/2026-ai-ri…

Zhijing Jin

@ZhijingJin

Mar 25

AI is threatening our democratic society—by concentrating power, narrowing how we think, and flooding institutions faster than they can keep up. These risks emerge at the system level, and technical work alone won't fix them. 👉Check out our whitepaper with 25 researchers: zhijing-jin.com/d/2026-ai-ri… 💡We introduce 7 threat models and ways forward. ✍️Led by @davidguzman1120 with @DaveRBanerjee, @blin_kevin, @PepijnCobben, @gcorsi_, @x_angelohuang, @ChanglingXavier, Suvajit Majumder, @psyonp, @SimkoSamuel, @strauss_irene, and @TerryJCZhang Advised by senior co-authors: @ashton1anderson, @Yoshua_Bengio, @MatthiasBethge, @RogerGrosse, Karoline Helbig, @david_lie, Richard Mallah, @radamihalcea, Susan Nesbitt, Susan Perry, @presnick, Stuart Russell, @mrinmayasachan, @bschoelkopf @audreyt and @ZhijingJin Thank you to all the institutional support from @JinesisLab @EuroSafeAI @MPI_IS @CIFAR_News @iapsAI @CARMA_411 @Cambridge_Uni @UofTCompSci @VectorInst @TorontoSRI @Mila_Quebec @LawZero_ @uni_tue @michigan_AI @UMichCSE @AUParis @UNESCO @UCBerkeley @ETH_en @ETH_AI_Center @ELLISInst_Tue @ELLISforEurope @EthicsInAI #CivicAI #AISafety #AIGovernance #Democracy #ResponsibleAI

732

Zhijing Jin

Samuel Simko retweeted

Zhijing Jin

@ZhijingJin

Mar 22

Mech interp or representation interp? We need to decode the causal computational graph of #LLMs—not just cataloguing representations (steering vectors etc). Analogy: we can’t understand biology by just blood composition. We need to understand how the body works. Same for LLMs.

165

9,892

Samuel Simko

Samuel Simko

@SimkoSamuel

Mar 3

🚀 We're launching EuroSafeAI, a nonprofit focused on multi-agent AI safety research, here in Zurich. Our launch event is on 6 March at 6:30 PM at the ETH Student Project House. Expect lightning talks and drinks! Info & Sign-up: luma.com/hwo46ach See you there! 🔥

EuroSafeAI Launch Event · Luma

AI is shaping our world. How can we make it safe? Join the launch event of EuroSafeAI, a research org exploring the frontiers of AI safety. Hear lightning…

luma.com

865

Samuel Simko

Samuel Simko

@SimkoSamuel

Mar 3

🗓️ 6 March 2026 · 6:30 PM 📍 ETH Student Project House E floor, Clausiusstrasse 16 eurosafe.ai.toronto.edu/

Samuel Simko

Samuel Simko

@SimkoSamuel

Mar 3

Speakers: @ZhijingJin (CIFAR AI Chair; Professor@UoT, Chief Scientist), @x_angelohuang (Co-founder and Director), @pepijncobben (Co-founder and Director), @SimkoSamuel (Technical Staff), @davidguzman1120 (Technical Staff).

Jinesis Lab (UToronto)

Samuel Simko retweeted

Jinesis Lab (UToronto)@JinesisLab

Mar 2

Check out CircuitLab🚀 A Scalable Python library for training Cross-Layer Transcoders (CLTs) Visual interface & auto-interp incoming so mark our repo: github.com/circuits-research… Collaborative effort @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab

GitHub - LLM-Interp/CLT-Forge: A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder...

A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder Training and Attribution-Graph Visualization - LLM-Interp/CLT-Forge

github.com

2,077