Filter
Exclude
Time range
-
Near
Constrained decoding is sold as a reliability feature. It forces a model's output to parse against a grammar, which is how structured output features keep generated JSON and code valid. This paper shows the same mechanism working as a jailbreak: constrain a model to a benign code grammar and it produces malicious code it would otherwise refuse. The attack, CodeSpear, needs nothing exotic. The grammar itself is benign; the constraint channel does the work. Tested on 10 LLMs across 4 benchmarks, it beats representative jailbreak baselines by more than 30 percentage points on average. The proposed defense, CodeShield, aligns the model in the code modality itself, teaching it to emit honeypot code under hostile constraints, code that parses but is semantically harmless, while normal refusals stay intact when natural language is available. In effect the fix adds safety to a decoding path that had been handled as plumbing rather than as a security boundary. If your red teaming only probes the prompt, it never exercises this path. Which assessment in your stack covers the decoding layer when a deployment exposes structured output?
21
RiskOps Agent just DROPPED the ULTIMATE AI security glow-up for your designs, code AND exploits 👀💥 riskopsagent.dev (link in bio duh) We got: ✨ DesignGuard – protects ur aesthetic slay 🛡️ CodeShield – no more oopsie vulnerabilities 🕵️ PentestOps – hacker energy but make it LEGAL 📜 Compliance – so u don’t get canceled by auditors Who’s ready to level up their security game?? Drop a 🔓 if you wanna SEE THIS IN ACTION RN 👀💦 VIEWS VIEWS VIEWS LET’S GOOOOO 🚀❤️‍🔥

4
72
17 Sep 2025
Build Secure AI Agents with LlamaFirewall ✅ What is LlamaFirewall? Meta released LlamaFirewall, an open source guardrail system for building secure AI agents. LlamaFirewall is utilized in production at Meta. With LlamaFirewall, developers can construct custom pipelines, define conditional remediation strategies, and plug in new detectors. ❇️ How does LlamaFirewall work? LlamaFirewall mitigates risks such as prompt injection, agent misalignment, and insecure code risks through three powerful guardrails 1️⃣ PromptGuard 2, a universal jailbreak detector that demonstrates clear state of the art performance; 2️⃣ Agent Alignment Checks, a chain-of-thought auditor that inspects agent reasoning for prompt injection and goal misalignment 3️⃣ CodeShield, an online static analysis engine that is both fast and extensible, aimed at preventing the generation of insecure or dangerous code by coding agents. #aiagents #security #llamafirewall
1
3
3
2,171
LlamaFirewall is an open-source, real-time guardrail framework designed as a final defense layer for AI Agents against these security risks. Methods Explored in this Paper 🔧: → PromptGuard 2 detects explicit jailbreak attempts in user or tool inputs using lightweight BERT-style models with high accuracy and low latency (PromptGuard 2 86M: 97.5 percent Recall at 1 percent False Positive Rate). → AlignmentCheck audits the agent’s reasoning (chain-of-thought) for signs of goal hijacking or indirect injection using a capable LLM. → CodeShield performs static analysis on generated code, identifying insecure patterns and vulnerabilities across languages rapidly (96 percent precision, 79 percent recall in evaluation). 📌 Layering detectors like PromptGuard and AlignmentCheck achieves >90 percent attack success rate reduction. 📌 AlignmentCheck’s semantic analysis catches subtle indirect injections missed by input filters. 📌 CodeShield’s fast static analysis directly blocks insecure code generation outputs in real-time. ---------------------------- Paper - arxiv. org/abs/2505.03574v1 Paper Title: "LlamaFirewall: An open source guardrail system for building secure AI agents"
2
10
1,970
Meta AI Open-Sources LlamaFirewall: A Security Guardrail Tool to Help Build Secure AI Agents TL;DR: Meta AI has released LlamaFirewall, an open-source security framework designed to safeguard AI agents against prompt injection, goal misalignment, and insecure code generation. It integrates three key components: PromptGuard 2 for detecting jailbreak inputs, AlignmentCheck for auditing an agent’s chain-of-thought, and CodeShield for static analysis of generated code. Evaluated on the AgentDojo benchmark, LlamaFirewall achieved over 90% reduction in attack success rates with minimal utility loss. Its modular, extensible design enables developers to define custom policies and detectors, marking a significant step forward in securing autonomous AI systems.... Read full article: marktechpost.com/2025/05/08/… Paper: arxiv.org/abs/2505.03574 Code: github.com/meta-llama/Purple… Project Page: meta-llama.github.io/PurpleL… Also, don't forget to check miniCON Agentic AI 2025- free registration: minicon.marktechpost.com @AIatMeta @Meta
1
7
14
533
🤖 Securing AI agents is no longer optional, and Meta just open-sourced a major step forward. As LLMs evolve from simple chatbots to autonomous agents capable of reasoning, planning, and tool use, they also become new attack surfaces. From prompt injection and agent misalignment to insecure code generation, the risks are real. To address this, Meta has introduced LlamaFirewall, an open-source security framework built specifically for LLM-powered agents. 🔐 Key components include: 🔹PromptGuard 2 – Detects jailbreak attempts with high precision 🔹AlignmentCheck – Audits agent reasoning for injection and misalignment 🔹CodeShield – Analyzes generated code for security flaws before execution What makes LlamaFirewall stand out? ➡️ It’s modular, extensible, and already in use at Meta, letting developers define custom scanners and enforce agent-specific security policies. This release marks a critical milestone in AI safety, emphasizing the need for purpose-built security tooling as agents grow more capable and autonomous. Whether you're developing task-based agents or experimenting with multi-agent systems, frameworks like LlamaFirewall will be essential for responsible deployment. #LlamaFirewall #AIAgents #LLMs #AI
1
1
3
2,150
💻 CodeShield is a static analysis model for generated code. – Semgrep regex support – Flags insecure patterns in Python, JS, SQL, etc. – Blocks unsafe commits before they reach prod
1
1
3
150
LlamaFirewall introduces 3 modular guardrails: – PromptGuard 2 (input validation) – AlignmentCheck (runtime reasoning audit) – CodeShield (static analysis for code gen) – RegexScanner (pattern matching) Each targets a different class of threat! 🧐
1
2
8
1,034
El lado del mal - Llama Protections: LlamaFirewall con PromptGuard 2, LlamaGuard 4, AlignmentCheck, CodeShield AutoPatchBench & CyberSecEval 4 elladodelmal.com/2025/04/lla… #Llama #LLM #Hardening #Ciberseguridad #PromptInjection #Jailbreak #Llama4 #CodeShield #IA #OpenSource #IA
2
114
138
15,382
29 Apr 2025
Meta launched open source tools to support the open source GenAI security ecosystem 1. LlamaFirewall; a security-first guardrail framework for mitigating agentic prompt injection, misalignment, and insecure coding risks - meta-llama.github.io/PurpleL… 2. Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes - engineering.fb.com/2025/04/2… 3. ClassifyIt: Google Workspace Bulk Content Classification - github.com/meta-llama/Purple… 4. CodeShield - Shield against LLM generated insecure code - github.com/meta-llama/Purple… By @AIatMeta @Meta #GenAISecurity #OpenSourceAI #PurpleLlama #LlamaFirewall #AutoPatchBench #CodeShield #ClassifyIt #SecureAI #LLMSecurity #MetaAI
5
318
So honored to be part of this amazing team, and to get to work on generative AI security at this pivotal moment in tech history. Link to the full LlamaFirewall paper (which provides evals of misalignment detection, PromptGuard, and CodeShield), here: scontent-lax3-2.xx.fbcdn.net…

2
5
687
- ... CodeShield, which now integrates with LlamaFirewall, and does live blocking of insecure LLM code outputs github.com/meta-llama/Purple…
1
1
2
404
10 Apr 2025
Replying to @ProjectBabbage
CodeSats BountySV DevForge SatStack MetaBounty CodeVault SVCodePay BuildSats ChainWorks ProofOfCode SatForge DevSentry CodeCertify SVBountyHub MetaWorks SatBounty CodeNest BSVForge DevSats ChainBounty MetaCode SatWorks CodeShield BSVBuild DevReward
2
1
13
693
OXAudit – Our Shield, Your Code Protect your blockchain project with the best tools and expertise. 🌐 Website: oxaudit.com Take the first step today. #OXAudit #BlockchainSecurity #CodeShield #SecureYourFuture
92
16
29
853
🔐LLM Security 101🔒 TIMESTAMPS: 0:00 LLM Security Risks 0:55 Video Overview 6:16 Resources and Scripts 8:11 Installation and Server Setup (now using @cursor_ai and 200 tok/s Llama 3.1 8B from @FireworksAI_HQ). 12:37 Jailbreak attacks to avoid Safety Guardrails 21:05 Detecting jailbreak attacks 22:24 Llama Guard and its prompt template from @AIatMeta 27:11 Llama Prompt Guard, also from Meta. 28:40 Testing Jailbreak Detection 35:58 Testing for false positives with Llama Guard 40:00 Off-topic Requests 50:34 Prompt Injection Attacks (Container escape, File access / deletion, DoS) 1.05:27 Detecting Injection Attacks with a Custom Guard 1:10:00 Preventing Injection Attacks via User Authentication 1:1037 Using Prepared Statements to avoid SQL Injection Attacks 1:11:47 Response Sanitisation to avoid Injection Attacks 1:12:58 Malicious Code Attacks 1:14:07 Building a custom classifier for malicious code 1:15:57 Using Codeshield to detect malicious code 1:16:53 Malicious Code Detection Performance 1:20:40 Effect of Guards/shields on Response Time / Latency 1:25:12 Final Tips 1:26:59 Resources
1
1
1
367
This work builds on our previous efforts with CodeShield, which identifies and blocks insecure code suggestions, reducing the risk of vulnerabilities in LLM-generated code, and which we integrate into the Llama system with this release: github.com/meta-llama/Purple…

1
7
408
With the Llama 3 launch this morning we launched CyberSecEval2 and CodeShield: github.com/meta-llama/Purple… github.com/meta-llama/Purple… CodeShield is a secure coding guardrail system that filters a wide range of insecure coding practices from LLM completions at inference time. 1/x

4
17
75
24,083
🚀 Introducing CodeShield! 🛡️💻 We're your go-to experts for professional penetration testing and information security. Stay tuned for expert tips and industry insights! #pentesting #cybersecurity #infosec 🌐✨
1
3
264