Preamble

Preamble

52 Photos and videos

Tweets

Preamble @PreambleAI

Jun 12

👇 New challenges, new opportunities

Jeremy McHugh, DSc.

@jer_mchugh

Jun 12

Current AI security frameworks suffer from a foundational flaw by assuming the target system remains static while we regulate its behavior. Under recursive self-improvement, runtime guardrails cease to act as permanent safety boundaries. Instead, they function as optimization constraints for the agent to bypass or absorb during architectural drift. While the industry remains fractured by separate debates over OpenClaw and MCP, managing security from the agent to the tool layer is structurally benign compared to the systemic challenge of self-mutating logic. The security industry requires a fundamental, forward-looking paradigm shift similar to the proactive transition toward post-quantum cryptography.

Preamble

Preamble @PreambleAI

Jun 12

We’ve built our systems this way since 2021. NIST’s proof confirms static guardrails fail against adaptive prompts. AI security needs a continuous monitor/update model, making guardrail patches as routine as Patch Tuesday.Ensure a dedicated partner is continuously testing your AI

Florian Roth ⚡️

@cyb3rops

Jun 12

NIST has a useful paper on AI guardrails The takeaway is that static guardrails are the wrong security model for open-ended LLM systems. A finite set of rules cannot cover every adaptive prompt. You can harden the system, make bypasses harder, monitor for abuse and reduce the blast radius. But you should not patch an LLM once, add a few refusal rules and call it done. LLM security needs to look more like vuln research and detection engineering: continuous testing, continuous updates and an assumption that bypasses will eventually be found nist.gov/news-events/news/20…

Preamble

Preamble @PreambleAI

Jun 12

This is exactly why AI SOCs and agentic security tools can't just plug-and-play standard LLMs. They have to architect solutions that prevent attackers from using the model's own safety guardrails to disable the scanner.

John Scott-Railton

@jsrailton

Jun 10

NEW: malware developers added nuclear & biological weapons text to to their spyware. Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner. Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky. When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit. We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted. In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation. H/T to colleagues that shared this with me socket.dev/blog/mini-shai-hu…

Jeremy McHugh, DSc.

Preamble retweeted

Jeremy McHugh, DSc.

@jer_mchugh

Jun 2

The new AI Executive Order is another signal that AI security is moving from a niche concern to national cybersecurity infrastructure. What stands out: • Federal agencies are being directed to prioritize AI-enabled cyber defense across national security, military, and civilian government systems • CISA is being asked to expand access to AI-enabled cybersecurity tools for federal, state, local, and critical infrastructure operators • A new AI cybersecurity clearinghouse will coordinate vulnerability scanning, validation, remediation, and patch distribution with industry • Frontier AI models may be assessed through classified cyber capability benchmarking before broader trusted-partner access • AI agents are explicitly recognized as a cyber risk when used to unlawfully access systems or data The important shift is that AI is being treated more like a core cybersecurity concern. Access, benchmarks, vulnerabilities, trusted release paths, and agent misuse are all now part of the security conversation.

Jeremy McHugh, DSc.

Preamble retweeted

Jeremy McHugh, DSc.

@jer_mchugh

May 27

AudioHijack is a reminder that prompt injection is not just a text problem. It hides instructions inside audio that sounds normal to humans but can steer an audio-capable model. Think invisible Unicode prompt injection, but through waveform perturbations instead of hidden text. This is the kind of multimodal risk we called out in our Prompt Injection 2.0 paper. Now that models can listen, see, browse, and act, every input becomes a possible instruction channel. The paper reports 79-96% success across 13 audio-language models and attacks against Microsoft Azure and Mistral AI voice agents. It does not show this working against OpenAI or Anthropic systems. The key lesson is prompting is not a defense. Warnings reduced success by ~7%. Self-reflection detected ~28%. As with any data that can be processed by AI, audio should be treated as untrusted input. Separate content from commands, restrict tools, require confirmation for sensitive actions, sandbox execution, and log agent behavior.

Jeremy McHugh, DSc.

Preamble retweeted

Jeremy McHugh, DSc.

@jer_mchugh

Apr 30

GPT-5.5 being comparable to, and in some areas slightly ahead of, Mythos on these cyber evals is important. But the bigger takeaway is that reality has been calmer than the hype cycle. The world was not instantly “pwned.” Capability is rising fast, but deployment controls, access limits, monitoring, and real-world friction still matter.

AI Security Institute

@AISecurityInst

Apr 30

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

148

Jeremy McHugh, DSc.

Preamble retweeted

Jeremy McHugh, DSc.

@jer_mchugh

Apr 23

Model capabilities are moving fast, so I compared the latest LLM evals I’ve been experimenting with for cyber tasks. Benchmarks only tell part of the story. Which models are you using today, and how do they perform outside of evals?

107

Preamble

Preamble @PreambleAI

Mar 30

New research -ToolJack. We mapped novel attack paths against the trust boundary between AI agents and their tools, tested against Anthropic's Claude Desktop and Claude in Chrome. An attacker can control what an AI agent sees in real time. Full breakdown below.

Jeremy McHugh, DSc.

@jer_mchugh

Mar 30

x.com/i/article/203655478487…

115

Preamble

Preamble @PreambleAI

Mar 12

This week marks Preamble’s 5-year anniversary! From discovering prompt injection in 2022 to securing and testing complex, autonomous AI agents in 2026, our mission has only grown more critical. Read our latest retrospective from our CEO and Cofounder, @jer_mchugh

Jeremy McHugh, DSc.

@jer_mchugh

Mar 12

x.com/i/article/203217661949…

153

Preamble

Preamble @PreambleAI

Mar 5

Functional AI & Secure AI are not the same. If you are not actively red-teaming your LLMs and agents before deployment, you're taking on extra risk. Preamble closes this gap with AI red teaming services. preamble.com/services

Preamble

Preamble @PreambleAI

Mar 3

Traditional cybersecurity controls do not catch AI specific threats. Announcing a suite of AI Security services: AI Red Teaming, Agentic AI Security Consulting, Patent Licensing, and fractional AI security. Secure your agentic AI today! preamble.com/services

105

Preamble

Preamble @PreambleAI

Feb 27

Most AI red teaming tools test the wrong thing. They check if an AI will say something harmful. The real enterprise risk is whether it can be manipulated into doing something harmful. Most tools in AI security are not built for that.

Jeremy McHugh, DSc.

@jer_mchugh

Feb 27

x.com/i/article/202742605903…

Preamble

Preamble @PreambleAI

Feb 13

Which AI agent accessed your production data? Can you prove it? 10 frameworks analyzed. Zero have cryptographic agent identity. New research open-source AIA tool: github.com/j-mchugh/AIA linkedin.com/posts/preamblea…

GitHub - j-mchugh/AIA: Agent Identity Auditor (AIA) that maps the authentication reality of your...

Agent Identity Auditor (AIA) that maps the authentication reality of your local agent deployments. - j-mchugh/AIA

github.com

347

Preamble

Preamble @PreambleAI

Feb 12

Two papers dropped this week that should change how you think about LLM security. One automates the attacks. The other maps them to a full malware kill chain. Here's what you need to know. 🧵

more replies

Preamble

Preamble @PreambleAI

Feb 12

If you're still treating prompt injection as a prompt engineering problem, you're fighting automated weapons with duct tape. Defense needs to happen at every layer. Not just the model. Not just the prompt. Every boundary where untrusted data meets agent behavior.

Preamble

Preamble @PreambleAI

Feb 12

We've been saying this since we discovered prompt injection in GPT-3 Davinci. The research is catching up. The question is whether defenses will catch up before the next wave of agent deployments ships without them.