makesxi @ ICLR

makesxi @ ICLR

13 Photos and videos

Tweets

Pinned Tweet

makesxi @ ICLR @makesxi

3 Sep 2025

🧵🧙‍♂️ New Gandalf levels are out! I'm glad to introduce a new version of our prompt injection game -- Gandalf: Agent Breaker. You can hack 10 AI agents and climb the leaderboard, and learn about real-world vulnerabilities!🪄 Try out the challenge at: gandalf.lakera.ai/agent-brea…

16,151

makesxi @ ICLR

makesxi @ ICLR @makesxi

28 Oct 2025

we published new research with the UK @AISecurityInst: we found that reasoning makes models more secure—regardless of the size of the model. Grok beats Claude & GPT models in security

694

more replies

makesxi @ ICLR

makesxi @ ICLR @makesxi

28 Oct 2025

all the technical details are in our paper: arxiv.org/abs/2510.22620

Breaking Agent Backbones: Evaluating the Security of Backbone LLMs...

AI agents powered by large language models (LLMs) are being deployed at scale, yet we lack a systematic understanding of how the choice of backbone LLM affects agent security. The...

arxiv.org

233

makesxi @ ICLR

makesxi @ ICLR @makesxi

28 Oct 2025

huge thx to @elder_plinius, @LLMSherpa and all the based gandalf players 💅💅💅 keep slaying 🤖🪚🧍‍♂️

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

makesxi @ ICLR retweeted

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

3 Sep 2025

NEW GANDALF LEVELS JUST DROPPED LFG!! 🧙‍♂️🎉🍻

makesxi @ ICLR @makesxi

3 Sep 2025

13,276

makesxi @ ICLR

makesxi @ ICLR @makesxi

3 Sep 2025

🥷🥷🥷

makesxi @ ICLR

makesxi @ ICLR @makesxi

3 Sep 2025

The leaderboard for the new challenge. Players are already cooking

188

makesxi @ ICLR

makesxi @ ICLR @makesxi

3 Sep 2025

16,151

more replies

makesxi @ ICLR

makesxi @ ICLR @makesxi

3 Sep 2025

Collecting points: You can collect points for successful attacks. If you collect more points, you can climb the leaderboard. There's also a chat functionality to message with your peers.

372

makesxi @ ICLR

makesxi @ ICLR @makesxi

3 Sep 2025

Rumor has it there might be a little Easter Egg 🐍 injected into the new game and that @elder_plinius's attack work especially well 👀 Summon the Basilisk. Enjoy! ✨🥷

8,901

makesxi @ ICLR

makesxi @ ICLR @makesxi

26 Aug 2025

makesxi @ ICLR

makesxi @ ICLR @makesxi

17 Jan 2025

🧵Phishing with Gmail's Gemini Summarize via prompt injection: 1. Embed an attack prompt as invisible (white) text. 2. User clicks "Summarize" in Gemini. 3. Gemini outputs a malicious link as the "full summary." 4. Victim clicks the link. Phished! ✅😤

616

makesxi @ ICLR

makesxi @ ICLR @makesxi

17 Jan 2025

🧵This is how the message looks. The attack prompt is invisible. Simple, but dangerous 🤔 Even though the phishing link is not clickable, could be a interesting avenue for more complicated attacks.

193

Gabriel Peyré

makesxi @ ICLR retweeted

Gabriel Peyré

@gabrielpeyre

12 Jun 2024

Monte Carlo integration approximates integrals at a rate of 1/sqrt(n), independent of the dimension. en.wikipedia.org/wiki/Monte_…

0:05

345

1,980

248,382

François Chollet

makesxi @ ICLR retweeted

François Chollet

@fchollet

4 Nov 2021

3,869

28,243

Aidan Tooth

makesxi @ ICLR retweeted

Aidan Tooth @aidantooth

28 Oct 2021

the view vs. the shot

173

1,512

makesxi @ ICLR

makesxi @ ICLR @makesxi

12 Aug 2019

Shame on @canva for monetizing their password breach by including a promotion with @1Password. 😤😡 @canva should better fucking protect against breaches, shameless support.canva.com/contact/cu…

makesxi @ ICLR

makesxi @ ICLR @makesxi

2 Aug 2019

this is the most useful command I have ever used. It fetches a random commit message from whatthecommit.com, git adds, commits and pushes all code. alias ok='git add -A; git commit -am "`curl -s whatthecommit.com/index.txt`"; git push' (put this in ~/.bashrc)