Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing -
arxiv.org/pdf/2509.09207v1
Problem Statement: Most “AI pentesting” is tested on easy, puzzle-style CTF labs with hints. That’s not how real targets look. In the real world you have lots of noise (many harmless services), you must find the real entry, run the right exploit, and actually get a shell (real system control). Today’s agents usually fail there.
In this paper, we propose TermiAgent, a multi-agent framework tailored for real-world penetration testing. To address the challenge of long-context forgetting in penetration testing, we introduce a Located Memory Activation approach. When predicting its next action, the agent automatically activates all relevant memories required for decision-making, reflecting the characteristics of real-world penetration testing tasks.
To build an up-to-date and ready-to-use exploit arsenal, we formulate exploit integration as a structured code-understanding problem rather than a simple retrievaland-execution task. Unlike naive methods that merely fetch public PoC repositories and attempt direct execution, our approach ensures robust and reliable exploit utilization
Authors: Wuyuao Mai, Geng Hong, Qi Liu, Jinsong Chen, Jiarun Dai, Xudong Pan, Yuan Zhang, Min Yang
@FudanUniversity
#AISecurity #LLMAgents #AgenticAI #AutoPentest #PenetrationTesting #OffensiveAI #RedTeam #CTF #SecurityBenchmarks #CyberResearch #AutonomousAgents #ToolUse #MemoryAgents #ShellOrNothing #AIxCyber