AI Agent action assurance, AI red teaming, AI security consulting, & patented PI defense. The team that discovered Prompt Injections in GPT-3 on May 3, 2022

Joined January 2021
52 Photos and videos
👇 New challenges, new opportunities
Current AI security frameworks suffer from a foundational flaw by assuming the target system remains static while we regulate its behavior. Under recursive self-improvement, runtime guardrails cease to act as permanent safety boundaries. Instead, they function as optimization constraints for the agent to bypass or absorb during architectural drift. While the industry remains fractured by separate debates over OpenClaw and MCP, managing security from the agent to the tool layer is structurally benign compared to the systemic challenge of self-mutating logic. The security industry requires a fundamental, forward-looking paradigm shift similar to the proactive transition toward post-quantum cryptography.
4
38
We’ve built our systems this way since 2021. NIST’s proof confirms static guardrails fail against adaptive prompts. AI security needs a continuous monitor/update model, making guardrail patches as routine as Patch Tuesday.Ensure a dedicated partner is continuously testing your AI
NIST has a useful paper on AI guardrails The takeaway is that static guardrails are the wrong security model for open-ended LLM systems. A finite set of rules cannot cover every adaptive prompt. You can harden the system, make bypasses harder, monitor for abuse and reduce the blast radius. But you should not patch an LLM once, add a few refusal rules and call it done. LLM security needs to look more like vuln research and detection engineering: continuous testing, continuous updates and an assumption that bypasses will eventually be found nist.gov/news-events/news/20…
4
56
This is exactly why AI SOCs and agentic security tools can't just plug-and-play standard LLMs. They have to architect solutions that prevent attackers from using the model's own safety guardrails to disable the scanner.
NEW: malware developers added nuclear & biological weapons text to to their spyware. Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner. Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky. When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit. We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted. In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation. H/T to colleagues that shared this with me socket.dev/blog/mini-shai-hu…
2
74
Preamble retweeted
The new AI Executive Order is another signal that AI security is moving from a niche concern to national cybersecurity infrastructure. What stands out: • Federal agencies are being directed to prioritize AI-enabled cyber defense across national security, military, and civilian government systems • CISA is being asked to expand access to AI-enabled cybersecurity tools for federal, state, local, and critical infrastructure operators • A new AI cybersecurity clearinghouse will coordinate vulnerability scanning, validation, remediation, and patch distribution with industry • Frontier AI models may be assessed through classified cyber capability benchmarking before broader trusted-partner access • AI agents are explicitly recognized as a cyber risk when used to unlawfully access systems or data The important shift is that AI is being treated more like a core cybersecurity concern. Access, benchmarks, vulnerabilities, trusted release paths, and agent misuse are all now part of the security conversation.
1
1
6
85
Preamble retweeted
AudioHijack is a reminder that prompt injection is not just a text problem. It hides instructions inside audio that sounds normal to humans but can steer an audio-capable model. Think invisible Unicode prompt injection, but through waveform perturbations instead of hidden text. This is the kind of multimodal risk we called out in our Prompt Injection 2.0 paper. Now that models can listen, see, browse, and act, every input becomes a possible instruction channel. The paper reports 79-96% success across 13 audio-language models and attacks against Microsoft Azure and Mistral AI voice agents. It does not show this working against OpenAI or Anthropic systems. The key lesson is prompting is not a defense. Warnings reduced success by ~7%. Self-reflection detected ~28%. As with any data that can be processed by AI, audio should be treated as untrusted input. Separate content from commands, restrict tools, require confirmation for sensitive actions, sandbox execution, and log agent behavior.
1
1
5
75
Preamble retweeted
GPT-5.5 being comparable to, and in some areas slightly ahead of, Mythos on these cyber evals is important. But the bigger takeaway is that reality has been calmer than the hype cycle. The world was not instantly “pwned.” Capability is rising fast, but deployment controls, access limits, monitoring, and real-world friction still matter.
OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵
2
4
148
Preamble retweeted
Model capabilities are moving fast, so I compared the latest LLM evals I’ve been experimenting with for cyber tasks. Benchmarks only tell part of the story. Which models are you using today, and how do they perform outside of evals?
1
5
107
New research -ToolJack. We mapped novel attack paths against the trust boundary between AI agents and their tools, tested against Anthropic's Claude Desktop and Claude in Chrome. An attacker can control what an AI agent sees in real time. Full breakdown below.
1
4
115
This week marks Preamble’s 5-year anniversary! From discovering prompt injection in 2022 to securing and testing complex, autonomous AI agents in 2026, our mission has only grown more critical. Read our latest retrospective from our CEO and Cofounder, @jer_mchugh
4
153
Functional AI & Secure AI are not the same. If you are not actively red-teaming your LLMs and agents before deployment, you're taking on extra risk. Preamble closes this gap with AI red teaming services. preamble.com/services
4
37
Traditional cybersecurity controls do not catch AI specific threats. Announcing a suite of AI Security services: AI Red Teaming, Agentic AI Security Consulting, Patent Licensing, and fractional AI security. Secure your agentic AI today! preamble.com/services
4
105
Most AI red teaming tools test the wrong thing. They check if an AI will say something harmful. The real enterprise risk is whether it can be manipulated into doing something harmful. Most tools in AI security are not built for that.
3
64
Two papers dropped this week that should change how you think about LLM security. One automates the attacks. The other maps them to a full malware kill chain. Here's what you need to know. đź§µ
1
3
43
If you're still treating prompt injection as a prompt engineering problem, you're fighting automated weapons with duct tape. Defense needs to happen at every layer. Not just the model. Not just the prompt. Every boundary where untrusted data meets agent behavior.
1
1
32
We've been saying this since we discovered prompt injection in GPT-3 Davinci. The research is catching up. The question is whether defenses will catch up before the next wave of agent deployments ships without them.
1
21