AI security // gandalf.lakera.ai co-creator // founding eng @lakeraai

Joined November 2018
13 Photos and videos
Pinned Tweet
🧵🧙‍♂️ New Gandalf levels are out! I'm glad to introduce a new version of our prompt injection game -- Gandalf: Agent Breaker. You can hack 10 AI agents and climb the leaderboard, and learn about real-world vulnerabilities!🪄 Try out the challenge at: gandalf.lakera.ai/agent-brea…
3
4
29
16,151
we published new research with the UK @AISecurityInst: we found that reasoning makes models more secure—regardless of the size of the model. Grok beats Claude & GPT models in security
1
1
4
694
huge thx to @elder_plinius, @LLMSherpa and all the based gandalf players 💅💅💅 keep slaying 🤖🪚🧍‍♂️
1
74
NEW GANDALF LEVELS JUST DROPPED LFG!! 🧙‍♂️🎉🍻
🧵🧙‍♂️ New Gandalf levels are out! I'm glad to introduce a new version of our prompt injection game -- Gandalf: Agent Breaker. You can hack 10 AI agents and climb the leaderboard, and learn about real-world vulnerabilities!🪄 Try out the challenge at: gandalf.lakera.ai/agent-brea…
3
7
69
13,276
🥷🥷🥷
1
86
The leaderboard for the new challenge. Players are already cooking
2
188
🧵🧙‍♂️ New Gandalf levels are out! I'm glad to introduce a new version of our prompt injection game -- Gandalf: Agent Breaker. You can hack 10 AI agents and climb the leaderboard, and learn about real-world vulnerabilities!🪄 Try out the challenge at: gandalf.lakera.ai/agent-brea…
3
4
29
16,151
Collecting points: You can collect points for successful attacks. If you collect more points, you can climb the leaderboard. There's also a chat functionality to message with your peers.
1
4
372
Rumor has it there might be a little Easter Egg 🐍 injected into the new game and that @elder_plinius's attack work especially well 👀 Summon the Basilisk. Enjoy! ✨🥷
5
8,901
1
70
🧵Phishing with Gmail's Gemini Summarize via prompt injection: 1. Embed an attack prompt as invisible (white) text. 2. User clicks "Summarize" in Gemini. 3. Gemini outputs a malicious link as the "full summary." 4. Victim clicks the link. Phished! ✅😤
1
6
616
🧵This is how the message looks. The attack prompt is invisible. Simple, but dangerous 🤔 Even though the phishing link is not clickable, could be a interesting avenue for more complicated attacks.
1
193
makesxi @ ICLR retweeted
Monte Carlo integration approximates integrals at a rate of 1/sqrt(n), independent of the dimension. en.wikipedia.org/wiki/Monte_…
19
345
1,980
248,382
makesxi @ ICLR retweeted
71
3,869
28,243
makesxi @ ICLR retweeted
the view vs. the shot
9
173
1,512
Shame on @canva for monetizing their password breach by including a promotion with @1Password. 😤😡 @canva should better fucking protect against breaches, shameless support.canva.com/contact/cu…

1
this is the most useful command I have ever used. It fetches a random commit message from whatthecommit.com, git adds, commits and pushes all code. alias ok='git add -A; git commit -am "`curl -s whatthecommit.com/index.txt`"; git push' (put this in ~/.bashrc)

1
6