KryptonAi by Alexandru Dan

KryptonAi by Alexandru Dan

Users
Tweets

KryptonAi by Alexandru Dan

@KryptonAi

Jun 12

Constrained decoding is sold as a reliability feature. It forces a model's output to parse against a grammar, which is how structured output features keep generated JSON and code valid. This paper shows the same mechanism working as a jailbreak: constrain a model to a benign code grammar and it produces malicious code it would otherwise refuse. The attack, CodeSpear, needs nothing exotic. The grammar itself is benign; the constraint channel does the work. Tested on 10 LLMs across 4 benchmarks, it beats representative jailbreak baselines by more than 30 percentage points on average. The proposed defense, CodeShield, aligns the model in the code modality itself, teaching it to emit honeypot code under hostile constraints, code that parses but is semantically harmless, while normal refusals stay intact when natural language is available. In effect the fix adds safety to a decoding path that had been handled as plumbing rather than as a security boundary. If your red teaming only probes the prompt, it never exercises this path. Which assessment in your stack covers the decoding layer when a deployment exposes structured output?

Seb⚡

Seb⚡

@cyberseb_

Feb 11

RiskOps Agent just DROPPED the ULTIMATE AI security glow-up for your designs, code AND exploits 👀💥 riskopsagent.dev (link in bio duh) We got: ✨ DesignGuard – protects ur aesthetic slay 🛡️ CodeShield – no more oopsie vulnerabilities 🕵️ PentestOps – hacker energy but make it LEGAL 📜 Compliance – so u don’t get canceled by auditors Who’s ready to level up their security game?? Drop a 🔓 if you wanna SEE THIS IN ACTION RN 👀💦 VIEWS VIEWS VIEWS LET’S GOOOOO 🚀❤️‍🔥

よし

よし

@yksanjo

22 Dec 2025

🔍 CodeShield AI – AI-powered code security scanner to detect vulnerabilities in codebases with smart analysis. 🔗 github.com/yksanjo/codeshiel…

GitHub - yksanjo/codeshield-ai: AI-powered code security scanner — detect vulnerabilities, secrets,...

AI-powered code security scanner — detect vulnerabilities, secrets, and hardcoded credentials - yksanjo/codeshield-ai

github.com

Kalyan KS

Kalyan KS

@kalyan_kpl

17 Sep 2025

Build Secure AI Agents with LlamaFirewall ✅ What is LlamaFirewall? Meta released LlamaFirewall, an open source guardrail system for building secure AI agents. LlamaFirewall is utilized in production at Meta. With LlamaFirewall, developers can construct custom pipelines, define conditional remediation strategies, and plug in new detectors. ❇️ How does LlamaFirewall work? LlamaFirewall mitigates risks such as prompt injection, agent misalignment, and insecure code risks through three powerful guardrails 1️⃣ PromptGuard 2, a universal jailbreak detector that demonstrates clear state of the art performance; 2️⃣ Agent Alignment Checks, a chain-of-thought auditor that inspects agent reasoning for prompt injection and goal misalignment 3️⃣ CodeShield, an online static analysis engine that is both fast and extensible, aimed at preventing the generation of insecure or dangerous code by coding agents. #aiagents #security #llamafirewall

2,171

Rohan Paul

Rohan Paul

@rohanpaul_ai

18 May 2025

LlamaFirewall is an open-source, real-time guardrail framework designed as a final defense layer for AI Agents against these security risks. Methods Explored in this Paper 🔧: → PromptGuard 2 detects explicit jailbreak attempts in user or tool inputs using lightweight BERT-style models with high accuracy and low latency (PromptGuard 2 86M: 97.5 percent Recall at 1 percent False Positive Rate). → AlignmentCheck audits the agent’s reasoning (chain-of-thought) for signs of goal hijacking or indirect injection using a capable LLM. → CodeShield performs static analysis on generated code, identifying insecure patterns and vulnerabilities across languages rapidly (96 percent precision, 79 percent recall in evaluation). 📌 Layering detectors like PromptGuard and AlignmentCheck achieves >90 percent attack success rate reduction. 📌 AlignmentCheck’s semantic analysis catches subtle indirect injections missed by input filters. 📌 CodeShield’s fast static analysis directly blocks insecure code generation outputs in real-time. ---------------------------- Paper - arxiv. org/abs/2505.03574v1 Paper Title: "LlamaFirewall: An open source guardrail system for building secure AI agents"

1,970

Marktechpost AI

Marktechpost AI

@Marktechpost

9 May 2025

Meta AI Open-Sources LlamaFirewall: A Security Guardrail Tool to Help Build Secure AI Agents TL;DR: Meta AI has released LlamaFirewall, an open-source security framework designed to safeguard AI agents against prompt injection, goal misalignment, and insecure code generation. It integrates three key components: PromptGuard 2 for detecting jailbreak inputs, AlignmentCheck for auditing an agent’s chain-of-thought, and CodeShield for static analysis of generated code. Evaluated on the AgentDojo benchmark, LlamaFirewall achieved over 90% reduction in attack success rates with minimal utility loss. Its modular, extensible design enables developers to define custom policies and detectors, marking a significant step forward in securing autonomous AI systems.... Read full article: marktechpost.com/2025/05/08/… Paper: arxiv.org/abs/2505.03574 Code: github.com/meta-llama/Purple… Project Page: meta-llama.github.io/PurpleL… Also, don't forget to check miniCON Agentic AI 2025- free registration: minicon.marktechpost.com @AIatMeta @Meta

1:05

533

Data Science Dojo

Data Science Dojo

@DataScienceDojo

8 May 2025

🤖 Securing AI agents is no longer optional, and Meta just open-sourced a major step forward. As LLMs evolve from simple chatbots to autonomous agents capable of reasoning, planning, and tool use, they also become new attack surfaces. From prompt injection and agent misalignment to insecure code generation, the risks are real. To address this, Meta has introduced LlamaFirewall, an open-source security framework built specifically for LLM-powered agents. 🔐 Key components include: 🔹PromptGuard 2 – Detects jailbreak attempts with high precision 🔹AlignmentCheck – Audits agent reasoning for injection and misalignment 🔹CodeShield – Analyzes generated code for security flaws before execution What makes LlamaFirewall stand out? ➡️ It’s modular, extensible, and already in use at Meta, letting developers define custom scanners and enforce agent-specific security policies. This release marks a critical milestone in AI safety, emphasizing the need for purpose-built security tooling as agents grow more capable and autonomous. Whether you're developing task-based agents or experimenting with multi-agent systems, frameworks like LlamaFirewall will be essential for responsible deployment. #LlamaFirewall #AIAgents #LLMs #AI

2,150

Thomas Roccia 🤘

Thomas Roccia 🤘

@fr0gger_

1 May 2025

💻 CodeShield is a static analysis model for generated code. – Semgrep regex support – Flags insecure patterns in Python, JS, SQL, etc. – Blocks unsafe commits before they reach prod

150

Thomas Roccia 🤘

Thomas Roccia 🤘

@fr0gger_

1 May 2025

LlamaFirewall introduces 3 modular guardrails: – PromptGuard 2 (input validation) – AlignmentCheck (runtime reasoning audit) – CodeShield (static analysis for code gen) – RegexScanner (pattern matching) Each targets a different class of threat! 🧐

1,034

Chema Alonso

Chema Alonso

@chemaalonso

30 Apr 2025

El lado del mal - Llama Protections: LlamaFirewall con PromptGuard 2, LlamaGuard 4, AlignmentCheck, CodeShield AutoPatchBench & CyberSecEval 4 elladodelmal.com/2025/04/lla… #Llama #LLM #Hardening #Ciberseguridad #PromptInjection #Jailbreak #Llama4 #CodeShield #IA #OpenSource #IA

114

138

15,382

AISecHub

AISecHub

@AISecHub

29 Apr 2025

Meta launched open source tools to support the open source GenAI security ecosystem 1. LlamaFirewall; a security-first guardrail framework for mitigating agentic prompt injection, misalignment, and insecure coding risks - meta-llama.github.io/PurpleL… 2. Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes - engineering.fb.com/2025/04/2… 3. ClassifyIt: Google Workspace Bulk Content Classification - github.com/meta-llama/Purple… 4. CodeShield - Shield against LLM generated insecure code - github.com/meta-llama/Purple… By @AIatMeta @Meta #GenAISecurity #OpenSourceAI #PurpleLlama #LlamaFirewall #AutoPatchBench #CodeShield #ClassifyIt #SecureAI #LLMSecurity #MetaAI

318

Joshua Saxe

Joshua Saxe

@joshua_saxe

29 Apr 2025

So honored to be part of this amazing team, and to get to work on generative AI security at this pivotal moment in tech history. Link to the full LlamaFirewall paper (which provides evals of misalignment detection, PromptGuard, and CodeShield), here: scontent-lax3-2.xx.fbcdn.net…

687

Joshua Saxe

Joshua Saxe

@joshua_saxe

29 Apr 2025

- ... CodeShield, which now integrates with LlamaFirewall, and does live blocking of insecure LLM code outputs github.com/meta-llama/Purple…

404

mohrt

mohrt

@mohrt

10 Apr 2025

Replying to @ProjectBabbage

CodeSats BountySV DevForge SatStack MetaBounty CodeVault SVCodePay BuildSats ChainWorks ProofOfCode SatForge DevSentry CodeCertify SVBountyHub MetaWorks SatBounty CodeNest BSVForge DevSats ChainBounty MetaCode SatWorks CodeShield BSVBuild DevReward

693

OXAUDIT 🛡

OXAUDIT 🛡@oxauditeth

17 Dec 2024

OXAudit – Our Shield, Your Code Protect your blockchain project with the best tools and expertise. 🌐 Website: oxaudit.com Take the first step today. #OXAudit #BlockchainSecurity #CodeShield #SecureYourFuture

853

Sai Charan Paloju

Sai Charan Paloju @SmartCherrysTho

16 Dec 2024

Full video on SmartCherrysThoughts.com #SmartCherrysThoughts #SaiCharanPaloju #SmartCherrysTech #JohannesNoll #CodeShield #CEO #CoFounder #SoftwareIndustry #TechEntrepreneur #Darmstadt #Germany #TechnicalUniversityDarmstadt #ComputerScience #Cybersecurity @CodeShield_io

0:56

116

Trelis Research

Trelis Research

@TrelisResearch

15 Aug 2024

🔐LLM Security 101🔒 TIMESTAMPS: 0:00 LLM Security Risks 0:55 Video Overview 6:16 Resources and Scripts 8:11 Installation and Server Setup (now using @cursor_ai and 200 tok/s Llama 3.1 8B from @FireworksAI_HQ). 12:37 Jailbreak attacks to avoid Safety Guardrails 21:05 Detecting jailbreak attacks 22:24 Llama Guard and its prompt template from @AIatMeta 27:11 Llama Prompt Guard, also from Meta. 28:40 Testing Jailbreak Detection 35:58 Testing for false positives with Llama Guard 40:00 Off-topic Requests 50:34 Prompt Injection Attacks (Container escape, File access / deletion, DoS) 1.05:27 Detecting Injection Attacks with a Custom Guard 1:10:00 Preventing Injection Attacks via User Authentication 1:1037 Using Prepared Statements to avoid SQL Injection Attacks 1:11:47 Response Sanitisation to avoid Injection Attacks 1:12:58 Malicious Code Attacks 1:14:07 Building a custom classifier for malicious code 1:15:57 Using Codeshield to detect malicious code 1:16:53 Malicious Code Detection Performance 1:20:40 Effect of Guards/shields on Response Time / Latency 1:25:12 Final Tips 1:26:59 Resources

1:27:14

367

Joshua Saxe

Joshua Saxe

@joshua_saxe

23 Jul 2024

This work builds on our previous efforts with CodeShield, which identifies and blocks insecure code suggestions, reducing the risk of vulnerabilities in LLM-generated code, and which we integrate into the Llama system with this release: github.com/meta-llama/Purple…

408

Joshua Saxe

Joshua Saxe

@joshua_saxe

18 Apr 2024

With the Llama 3 launch this morning we launched CyberSecEval2 and CodeShield: github.com/meta-llama/Purple… github.com/meta-llama/Purple… CodeShield is a secure coding guardrail system that filters a wide range of insecure coding practices from LLM completions at inference time. 1/x

24,083

CodeShield UK

CodeShield UK @CodeShieldUK

3 Jan 2024

🚀 Introducing CodeShield! 🛡️💻 We're your go-to experts for professional penetration testing and information security. Stay tuned for expert tips and industry insights! #pentesting #cybersecurity #infosec 🌐✨

264