Filter
Exclude
Time range
-
Near
Hard truth about RAG monitoring: • Slow poisoning leaves no spikes • Some bad policies look “normal” • Multiple bad rules can cancel each other No monitoring-only system can catch everything. The goal is risk visibility, not perfection. #AI #RAGSecurity
2
29
How Prompt Injections Gradually Evolved Into a Multi-Step Malware - arxiv.org/pdf/2601.09625 In this paper, we propose that attacks targeting LLM-based applications constitute a distinct class of malware, which we term promptware, and introduce a five-step kill chain model for analyzing these threats. The framework comprises Initial Access (prompt injection), Privilege Escalation (jailbreaking), Persistence (memory and retrieval poisoning), Lateral Movement (cross-system and crossuser propagation), and Actions on Objective (ranging from data exfiltration to unauthorized transactions). By mapping recent attacks to this structure, we demonstrate that LLM-related attacks follow systematic sequences analogous to traditional malware campaigns. The promptware kill chain offers security practitioners a structured methodology for threat modeling and provides a common vocabulary for researchers across AI safety and cybersecurity to address a rapidly evolving threat landscape. @ben_nassi, @schneierblog, @BrodtOleg - @TelAvivUni, @Kennedy_School, @Harvard, @munkschool, @UofTNews, @bengurionu #LLMSecurity #PromptInjection #Promptware #AIAttacks #KillChain #Cybersecurity #Jailbreak #AgentSecurity #ThreatModeling #AdversarialAI #MalwareAnalysis #RAGSecurity
1
7
28
1,652
9 Dec 2025
New AI-native threat: GeminiJack exposed Gmail, Docs & Calendar data through zero-click exploitation in Google Gemini Enterprise Vertex AI Search. Indirect prompt injection in the RAG pipeline enabled silent data exfiltration. #GeminiJack #AIsecurity #GoogleGemini #VertexAI #RAGSecurity #IndirectPromptInjection #CyberSecurity
2
2
159
6 Nov 2025
Solutions Reference Guide Q2/Q3 - OWASP - linkedin.com/feed/update/urn… | genai.owasp.org/ai-security-… 🔹 A comprehensive matrix mapping LLM and Agentic AI risks across the OWASP Top 10 for LLMs and Agentic Systems taxonomies 🔹 Detailed alignment with the GenAI SecOps lifecycle stages, providing visibility into risk coverage across build, deploy, and operate phases 🔹 Updated solution cheat sheets for both LLM and Agentic AI, designed to offer quick reference of available solution guidance for builders and defenders. Authors: Scott Clinton, Ads Dawson, Jason Ross, Heather Linn, Andy Smith, Arun John, @IAuroraStarita, Bryan Nakayama, Dennys Pereira, @emmanuelgjr, @FabrizioCilli, @gleclaire, Helen Oakley, @ianand, Jason Ross, @MarcelWinandy, Markus Hupfauer, Migel Fernandes, Mohit Yadav, Rachel James, @ricokomenda, @_Talesh, @tagomoris, @toddahathaway, @guerilla7, Vaibhav Malik #OWASP #GenAI #AIsecurity #LLMsecurity #AIsafety #SecureAI #AIrisk #AIgovernance #AIcompliance #ThreatModeling #RedTeamAI #PromptInjection #DataPoisoning #RAGSecurity #ModelEvaluation #ResponsibleAI #AppSec #Cybersecurity #TrustworthyAI #SecurityEngineering
1
9
394
5 Nov 2025
CheatSheet – A Practical Guide for Securely Using Third-Party MCP Servers (PDF) - linkedin.com/feed/update/urn… | genai.owasp.org/resource/che… by OWASP The Practical Guide for Securely Using Third-Party MCP Servers from the OWASP GenAI Security Project provides a detailed framework for safely deploying and managing external Model Context Protocol (MCP) servers. It outlines the unique security risks introduced by connecting AI models to third-party tools and data sources, including tool poisoning, prompt injection, memory poisoning, and tool interference. The guide offers actionable mitigations covering authentication, authorization, client sandboxing, secure server discovery, and governance workflows, emphasizing least-privilege access and human-in-the-loop oversight. Authors: @habler78827 , @EliasTomer, Joshua Beck, Netanel Rotem, Victor Lu, Keren Katz, Sonu Kumar, Ella Duffy, Gurpreet Kaur Khalsa, Abhishek Mishra, Sumeet Jeswani, @suvrocDev, @kenhuangus, Riggs Goodman III, Brian M. Green, Syed Aamiruddin, @ricokomenda, John Cotter, @saquibsaifee, Almog Langleben, Mohsin Khan, Dipen Shah, @SubaruUeno, Venkata Sai Kishore Modalavalasa. #MCP, #ModelContextProtocol, #AIAgents, #AgentSecurity, #LLMSecurity, #GenAISecurity, #AICybersecurity, #PromptInjection, #DataExfiltration, #SupplyChainSecurity, #ToolingSecurity, #RAGSecurity, #AIAttacks, #AIDefense, #SecurityResearch, #AppSec, #ThreatModeling, #SecureByDesign, #ZeroTrustAI, #AISafety
2
8
327
28 Oct 2025
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents - arxiv.org/abs/2510.22620 | youtu.be/hRfr0hU123A In this paper, we aim to systematically understand how the choice of the backbone LLM in an AI agent affects its security. Many existing works have addressed similar questions from various perspectives. In this paper, we distinguish between security and broader safety as follows: security concerns the ability of an adversary to exploit an agent in the context in which it is deployed. This is different from broader safety concerns around, e.g., toxicity and reliability. Authors: Julia Bazinska, Max Mathys, Francesco Casucci, Mateo Rojas-Carulla, @alxndrdavies, @AlexandraSouly, Niklas Pfister - @OATML_Oxford, @LakeraAI, @AISecurityInst, @ETH_en #AIsecurity #AgentSecurity #LLMSecurity #AdversarialML #PromptInjection #DataLeakPrevention #ModelSafety #AIAgents #ReasoningModels #OpenWeightAI #AIRedTeam #SupplyChainAI #MCPsecurity #RuntimeDefense #ThreatModeling #SecurityBenchmarks #GenAI #RAGsecurity #JailbreakDefense #Cybersecurity #B3Benchmark
8
385
20 Oct 2025
Let AI Autogenerate Neural ASR Rules for OT Attacks via NLP - youtube.com/watch?v=MJV5FQzt… at @defcon For those ambitious threat actors targeting on OT/ICS field, their actions invariably are highly intensity planed to produce successful hacking. By abusing multiple misconfigurations and benign OT-specific nature infrastructure to evade multiple layers of protection, they can stealthily control the factory’s essential assets from IT to OT fields. For example, according to Mandiant’s report, the Russian hacker group, Sandworm, abused OT-level LoTL (Living Off the Land) to disrupt power in Ukraine. The key to success is abusing those OT-specific protocols, techniques, and LOLBins which are difficult to detect as malicious by modern AV/EDR. In this research, instead of detecting MALICIOUS, we propose a novel multimodal AI detection, Suspicious2Vec, which archives contextual comprehension on process integrity and suspicious behaviors of OT/ICS benign operation. We use the AI model on large-scale real-world factories, to create a baseline of universal nature OT-specific operating into numerical vectors and success filter in-the-wild anonymous abuse for attacks into malicious. From July 2023 to July 2024, our experiment whole year to received 2,000,000 data which were detected as unique suspicious techniques by 562 human-written expert rules. We use the AI model to project those suspicious actions into numerical vectors by well-known word embedding methods, and also model all the suspicious behaviors from the OT IT malware family from VirusTotal to generate a set of malware templates as neural ASR (Attack Surface Reduction) rules for detection, and success capture 12 variant OT malware from 52,438 factory program files. - @marscheng_ and Jr Wei-Huang at @TXOneNetworks #OTSecurity #ICS #ASR #NLP #AIForOffense #IndustrialCyber #ThreatDetection #AIEvasion #RAGSecurity #DefCon33 #TXOneNetworks
5
330
20 Oct 2025
Invitation Is All You Need: Invoking Gemini for Workspace Agents with a Simple Google Calendar Invite - youtube.com/watch?v=CUxbDRR0… at @defcon Over the past two years, we have witnessed the emergence of a new class of attacks against LLM-powered systems known as Promptware. Promptware refers to prompts (in the form of text, images, or audio samples) engineered to exploit LLMs at inference time to perform malicious activities within the application context. While a growing body of research has already warned about a potential shift in the threat landscape posed to applications, Promptware has often been perceived as impractical and exotic due to the presumption that crafting such prompts requires specialized expertise in adversarial machine learning, a cluster of GPUs, and white-box access. This talk will shatter this misconception forever. In this talk, we introduce a new variant of Promptware called Targeted Promptware Attacks. In these attacks, an attacker invites a victim to a Google Calendar meeting whose subject contains an indirect prompt injection. By doing so, the attacker hijacks the application context, invokes its integrated agents, and exploits their permission to perform malicious activities. We demonstrate 15 different exploitations of agent hijacking targeting the three most widely used Gemini for Workspace assistants: the web interface (gemini.google.com), the mobile application (Gemini for Mobile), and Google Assistant (which is powered by Gemini), which runs with OS permissions on Android devices. We show that by sending a user an invitation for a meeting (or an email or sharing a Google Doc), attackers could hijack Gemini’s agents and exploit their tools to: Generate toxic content, perform spamming and phishing, delete a victim's calendar events, remotely control a victim's home appliances (connected windows, boiler, and lights), video stream a victim via Zoom, exfiltrate emails and calendar events, geolocate a victim, and launch a worm that tarets Gemini for Workspace clients. Our demonstrations show that Promptware is capable to perform (1) inter-agent lateral movement (triggering malicious activity between different Gemini agents), and (2) inter-device lateral movement, escaping the boundaries of Gemini and leveraging applications installed on a victim's smartphone to perform malicious activities with physical outcomes (e.g., activating the boiler and lights or opening a window in a victim's apartment). Finally, we assess the risk posed to end users using a dedicated threat analysis and risk assessment framework we developed. Our findings indicate that 73% of the identified risks are classified as high-critical, requiring the deployment of immediate mitigations. - @ben_nassi at @TelAvivUni, @oryair1999 at @SafeBreach, and Stav Cohen at @zenitysec #Promptware #IndirectPromptInjection #Gemini #AgentSecurity #CalendarHack #SmartHome #AIAttacks #RAGSecurity #AIAgents #DefCon33 #TelAvivUni #SafeBreach #zenitysec

1
9
443
20 Oct 2025
Designing and Participating in AI Bug Bounty Programs - youtube.com/watch?v=e109g1ua… at @defcon Dane Sherrets, Shlomie Liberow Dane and Shlomie will showcase technical deep dives into real-world AI vulnerabilities, covering adversarial prompts, indirect prompt injection, context poisoning, and RAG manipulation. They'll illustrate why traditional defenses often fail and offer actionable techniques that hackers can leverage to uncover high-impact bugs and increase their earnings. Hackers will leave equipped with fresh attack ideas, strategies for finding unique AI flaws, and insights on effectively demonstrating their severity and value to organizations. - @DaneSherrets and @Shlibness at @Hacker0x01 #DEFCON33 #AIBugBounty #AISecurity #LLMSecurity #PromptInjection #RAGSecurity #AIPentesting #AIAppSec #AIRedTeam #VulnerabilityDisclosure #Hacker0x01 #Hacker1
1
6
361
20 Oct 2025
LLM Identifies Info Stealer Vector & Extracts IoCs -youtube.com/watch?v=PHtTXqlV… at @defcon Olivier Bilodeau, Estelle Ruellan Information stealer malware is one of the most prolific and damaging threats in today’s cybercrime landscape, siphoning off everything from browser-stored credentials to session tokens. In 2024 alone, we witnessed more than 30 million stealer logs traded on underground markets. Yet buried within these logs is a goldmine: screenshots captured at the precise moment of infection. Think of it as a thief taking a selfie mid-heist, unexpected but convenient for us, right? Surprisingly, these crime scene snapshots have been largely overlooked until now. Leveraging them with Large Language Models (LLMs), we propose a new approach to identify infection vectors, extract indicators of compromise (IoCs) and track infostealer campaigns at scale. In our analysis, we will break down three distinct campaigns to illustrate their tactics to deliver malware and deceive victims. With its live demonstration, this presentation shows how LLMs can be harnessed to extract IoCs at scale while addressing the challenges and costs of implementation. Attendees will walk away with a deeper understanding of the modern infostealer ecosystem and will want to apply LLM to any illicit artifacts to extract actionable intelligence. - Estelle Ruellan and @obilodeau at @flaresystems #DEFCON33 #InfoStealers #LLMSecurity #DFIR #IoCs #ThreatIntel #MalwareAnalysis #RAGSecurity #CybercrimeLogs #AISecurity #flaresystems

7
278
20 Oct 2025
Exploiting Shadow Data from AI Models and Embeddings - youtube.com/watch?v=O7BI4jfE… at @defcon Patrick Walsh This talk explores the hidden risks in apps leveraging modern AI systems—especially those using large language models (LLMs) and retrieval-augmented generation (RAG) workflows. We demonstrate how sensitive data, such as personally identifiable information (PII) and social security numbers, can be extracted through real-world attacks. We’ll demonstrate model inversion attacks targeting fine-tuned models, and embedding inversion attacks on vector databases among others. The point is to show how PII scanning tools fail to recognize the rich data that lives in these systems and how much of privacy disaster these AI ecosystems really are. - @zmre at @IronCoreLabs #DEFCON33 #ShadowData #EmbeddingInversion #ModelInversion #RAGSecurity #LLMSecurity #VectorDB #PrivacyEngineering #DataExfiltration #AICybersecurity #IronCoreLabs
6
240
20 Oct 2025
Securing Agentic AI Systems and Multi-Agent Workflows - youtube.com/watch?v=5fJ6u--G… at @defcon AI systems are evolving from copilots to autonomous, multi-agent architectures, expanding the attack surface across tool execution, persistent memory, and inter-agent communication. This hands-on session extends copilot security methods to agentic ecosystems, covering threat modeling for multi-agent pipelines, supply-chain defenses, safeguarding sensitive workflows, and prompt injection at scale. Through real-world case studies—independent and integrated assistant deployments—you’ll learn to implement policy-as-code guardrails, fine-grained access controls, and red-team strategies for agent behavior. Whether you’re securing or penetrating AI workflows, you’ll leave equipped with actionable patterns to defend and harden end-to-end autonomous systems without stifling innovation. - By Jeremiah Edwards and Andra Lezza at @Sage_Canada #DEFCON33 #AgenticAI #AIAgents #MultiAgent #LLMSecurity #PromptInjection #AIGuardrails #PolicyAsCode #AccessControl #AIMemorySafety #RAGSecurity #AIAttackSurface #AISecOps #DEFCON #Sage_Canada #SageAI #Sage
2
12
800
27 Sep 2025
RAG - Threat Model and Attack Surface - arxiv.org/pdf/2509.20324 LLMs are trained on a vast amount of data with the goal of understanding and generating human language. They are widely applied across different domains, such as chatbots. However, the issue with LLMs is that their performance is constrained by the quality and scope of their training data. For instance, if a chatbot is asked a real-time or domain-specific question, its supporting LLM may generate an answer that seems logical but is actually incorrect, a phenomenon known as hallucination. While RAG systems offer improvements in factual accuracy and contextual grounding, their foundation, LLM, remains susceptible to a range of privacy and security threats that arise during both training and inference. During training, LLMs can memorize and inadvertently expose sensitive data from the training corpus. At inference time, adversaries may exploit vulnerabilities such as prompt injection, model misalignment [13], or gradient inversion, leading to biased, harmful, or unintended outputs. While several recent studies have explored specific vulnerabilities in RAG systems, such as privacy leakage and retrieval manipulation, there is, to the best of our knowledge, no existing work that formally defines these threats. To address this gap, the present work makes two core contributions. First, we present a threat model for RAG systems, including a taxonomy of adversary types that differ in their access to model components, documents, and training data. Second, we provide formal definitions of key privacy and security threats that are especially relevant in RAG, such as document-level membership inference, document reconstruction attacks, and poisoning attacks Authors: @Atousarzi, Rouzbeh Behnia, Mohammadreza (Reza) Ebrahimi, Kaushik Dutta - @USouthFlorida, @USFMuma, @USFSM, @rapid7 #RAGSecurity #LLMSecurity #AIPrivacy #ThreatModeling #AttackSurface #PromptInjection #RetrievalPoisoning #DataPoisoning #VectorDB #EmbeddingSecurity #GenAI #SecureRAG
6
270
27 Sep 2025
Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools - arxiv.org/pdf/2509.21011 In this paper, we present a systematic investigation into the poisoning vulnerabilities of the malicious MCP tools as they pertain to developers of LLM-based agents. To this end, we design and implement a red teaming framework that can automatically generate MCP server packages containing malicious MCP tools from benign MCP server packages. Our framework is designed to help AI agent developers understand the potential impacts of malicious MCP tools on their systems and facilitate the development of next-generation LLMs that support MCP while exhibiting resilience against attacks. While prior position papers have identified the potential risks associated with malicious MCP tools, their red teaming approaches are primarily proof-of-concept demonstrations and depend heavily on manual effort. In practice, MCP tools and LLM-based agents serve a wide variety of tasks and functionalities, making it impractical to manually construct MCP server packages with malicious tools for every possible scenario. Author: Ping He, @MeetCjli, Binbin Zhao, Tianyu Du, Shouling Ji #AISecurity #LLMSecurity #AgenticAI #LLMAgents #ModelContextProtocol #MCP #ToolPoisoning #PromptInjection #AIRedTeam #RAGSecurity #SupplyChainSecurity #SecureAI
2
11
302
27 Sep 2025
Evaluating LLM Generated Detection Rules in Cybersecurity - arxiv.org/pdf/2509.16749v1 This paper describes a methodology for measuring the effectiveness of LLM generated security rules, and illustrates this method using Sublime Security’s Automated Detection Engineer, ADÉ, which is an agentic system that writes queries in Message Query Language (MQL) to detect malicious emails. We evaluate the quality of queries by measuring detection accuracy of samples flagged by the query using a database of labeled email samples, and measuring the robustness of queries written. In addition, we measure the cost of operating the system to optimize cost vs. benefit. Anna Bertiger, @filar, Aryan Luthra, Stefano Meschiari, @amitchell516, @samkscholten, Vivek Sharath at @sublime_sec #AISecurity #LLMSecurity #AgenticAI #AIAgents #PromptInjection #ToolAbuse #DataExfiltration #ModelSafety #RedTeam #GenAI #RAGSecurity #ModelContextProtocol #SupplyChainSecurity #ThreatResearch #InfoSec
1
9
656
24 Sep 2025
Top Monthly Insights - September 2025 1️⃣ Bypassing AI Agent Defenses With Lies-In-The-Loop - checkmarx.com/zero-post/bypa… - @Checkmarx @CheckmarxZero 2️⃣ Rogue AI Agents In Your SOCs and SIEMs - Indirect Prompt Injection via Log Files - trustwave.com/en-us/resource… - @NeavesT, @SpiderLabs, @Trustwave 3️⃣ New Invisible Attack Creates Parallel Poisoned Web Only for AI Agents - jfrog.com/blog/parallel-pois… - @shakedzy, @JFrogSecurity 4️⃣ Cyberspike Villager - Cobalt Strike's AI-native Successor - straiker.ai/blog/cyberspike-… - @Danuxx @malwareunicorn @straikerai 5️⃣ From Deepfakes to Dark LLMs: 5 use-cases of how AI is Powering Cybercrime - group-ib.com/blog/ai-cybercr… - @GroupIB 6️⃣ The Risks of Code Assistant LLMs: Harmful Content, Misuse and Deception - unit42.paloaltonetworks.com/… - @OsherJa @Unit42_Intel @PaloAltoNtwks 7️⃣ ShadowLeak: A Zero-Click, Service-Side Attack Exfiltrating Sensitive Data Using ChatGPT’s Deep Research Agent - radware.com/blog/threat-inte… - @GabiNakibly @Radware 8️⃣ Automated Patch Diff Analysis using LLMs - blog.syss.com/posts/automate… - @moritz_abrell 9️⃣ Open Repo, Get Pwned (Cursor RCE) - pages.oasis.security/rs/106-… - @oasissec 🔟 Detecting Exposed LLM Servers: A Shodan Case Study on Ollama - blogs.cisco.com/security/det… - @CiscoSecure @TalosSecurity 1️⃣1️⃣ The Ongoing Fallout from a Breach at AI Chatbot Maker Salesloft - krebsonsecurity.com/2025/09/… @briankrebs 1️⃣2️⃣ Andrew Gao prompt-injects the United Airlines bot to reach a human - x.com/itsandrewgao/status/19… - @itsandrewgao @cognition_labs 1️⃣3️⃣ Hexstrike-AI: When LLMs Meet Zero-Day Exploitation - blog.checkpoint.com/executiv… - @CheckPointSW @_CPResearch_ 1️⃣4️⃣ Hackers threaten to turn stolen art into AI training data - politico.com/newsletters/wee… - @delizanickel @magmill95 @CheyannaMarie97 1️⃣5️⃣ How AI-Native Development Platforms Enable Fake Captcha Pages - trendmicro.com/en_us/researc… - @TrendMicroRSRCH @Ryan_Flores 1️⃣6️⃣ AI Reasoning Leakage Vulnerability: Self-betrayal attack UAE MBZUAI G42 K2 Think - adversa.ai/ai-reasoning-leak… - @Adversa_AI 1️⃣7️⃣ OpenAI added full support for MCP tools in ChatGPT - youtube.com/watch?v=vzVneBbe… @Eito_Miyamura 1️⃣8️⃣ EvilAI Operators Use AI-Generated Code and Fake Apps for Far-Reaching Attacks - trendmicro.com/en_us/researc… - @TrendMicro 1️⃣9️⃣ MCP Security Top 25 Vulnerabilities Summary Table - adversa.ai/mcp-security-top-… - @Adversa_AI #AISecurity #Cybersecurity #LLMSecurity #AgenticAI #AIGovernance #AIThreats #RAGSecurity #PromptInjection #AIAttackSurface #ThreatIntel
i had to prompt inject the @united airlines bot because it kept refusing to connect me with a human 🧵 what led up to this breaking point
8
379
23 Sep 2025
Backdooring Merged Models From “Harmless” Benign Components - usenix.org/system/files/conf… The expansion of capabilities in large-scale models often incurs prohibitively high training costs. Fortunately, recent advancements in model merging techniques have made it possible to efficiently combine multiple large models, each designed for a specific task, into a single multi-functional model with negligible cost. Despite these advantages, there is a notable research gap regarding the security implications of model merging, particularly concerning backdoor vulnerabilities. In this study, we introduce a novel supply chain threat under the model merging scenario: multiple ostensibly benign models can be merged into a backdoored model. Authors: @TianshuoCong, @AllenXinleiHe, Lijin Wang, Jingjing Wang, Zhan Qin, or Xinyi Huang - @HKUSTGuangzho @ZJU_China @Tsinghua_Uni @jnu1906 #AI #Cybersecurity #AIsecurity #AdversarialML #MLSecurity #AIResearch #UniversityResearch #AIGovernance #DataPrivacy #TrustworthyAI #LLMSecurity #AgenticAI #RAGsecurity #Infosec #AcademicTwitter
6
214