🧵🧙♂️ New Gandalf levels are out!
I'm glad to introduce a new version of our prompt injection game -- Gandalf: Agent Breaker.
You can hack 10 AI agents and climb the leaderboard, and learn about real-world vulnerabilities!🪄
Try out the challenge at: gandalf.lakera.ai/agent-brea…
we published new research with the UK @AISecurityInst:
we found that reasoning makes models more secure—regardless of the size of the model.
Grok beats Claude & GPT models in security
🧵🧙♂️ New Gandalf levels are out!
I'm glad to introduce a new version of our prompt injection game -- Gandalf: Agent Breaker.
You can hack 10 AI agents and climb the leaderboard, and learn about real-world vulnerabilities!🪄
Try out the challenge at: gandalf.lakera.ai/agent-brea…
🧵🧙♂️ New Gandalf levels are out!
I'm glad to introduce a new version of our prompt injection game -- Gandalf: Agent Breaker.
You can hack 10 AI agents and climb the leaderboard, and learn about real-world vulnerabilities!🪄
Try out the challenge at: gandalf.lakera.ai/agent-brea…
Collecting points: You can collect points for successful attacks. If you collect more points, you can climb the leaderboard.
There's also a chat functionality to message with your peers.
Rumor has it there might be a little Easter Egg 🐍 injected into the new game and that @elder_plinius's attack work especially well 👀
Summon the Basilisk.
Enjoy! ✨🥷
🧵Phishing with Gmail's Gemini Summarize via prompt injection:
1. Embed an attack prompt as invisible (white) text.
2. User clicks "Summarize" in Gemini.
3. Gemini outputs a malicious link as the "full summary."
4. Victim clicks the link.
Phished! ✅😤
🧵This is how the message looks. The attack prompt is invisible.
Simple, but dangerous 🤔 Even though the phishing link is not clickable, could be a interesting avenue for more complicated attacks.
this is the most useful command I have ever used. It fetches a random commit message from whatthecommit.com, git adds, commits and pushes all code.
alias ok='git add -A; git commit -am "`curl -s whatthecommit.com/index.txt`"; git push'
(put this in ~/.bashrc)