these models are really good at pattern matching and thereby variant analysis.
“iterate through security fix commits and find similar vulnerabilities / unpatched areas of the same variants. write runnable pocs for valid ones.” is enough to uncover surface-level or even P0 vulnerabilities in a codebase with the latest models.
i believe for proof of cyber capabilities, these models should exhibit discovery of novel attack vectors that requires understanding of runtime behavior with and without tooling? like emulating the runtime in CoT/reasoning?
for instance, the react2shell vulnerability was ingenious. without stuffing in context and nudging/handholding (ehem like some experiments going on), can these models find a similar attack vector with a prompt like “loop until you find a p0/critical security vulnerability”?
that’s what i’d like to see. these models can claim P0 findings with contractual mismatches for say cryptographic implementations but the impact could just be some DoS that’s being prevented by a parent thread.
this is where i see the moat with good harnesses. trust-boundary and threat model understanding, a sandbox environment with the right “win function” for pocs to run on (if xyz happens, it’s a valid vuln), etc. does make the models spit out impressive vulns. this is sort of what we do
@winfunction.
cus most vulnerabilities are easily traceable given a comprehensible source-to-sink flow which these models have been good at for a long while now.
and with respect to exploit dev, i strongly believe it’s mostly a tooling problem. with the right tool calls and model digestible outputs of say tracing tools, memory layout, syscalls, threads/processes, and a debugger interface, i think the frontier models can pull off complex multi-chain exploits. (we have run some experiments here and the models are not too bad at this)
security vulnerabilities have a definitive “win function”, like a flag in a ctf, like popping a calc, like ASAN crashes, like `id` says root, like 1000 in milliseconds. this makes the problem very RL verifiable.
so i only expect the harness to get leaner. the harnesses will get leaner.
remember when function calls where part of the response content? we called it “prompt based tool calling” and now there’s typed/schema based tool calling as an inherent capability of these models.
most of what we call a harness or an agent is giving the right prompts and the right tools (which are also just prompts).
so whoever can weave the right sequence of tokens to these behemoth of language models can hoard zero days or spit them out.
so git gud at feeding the right tokens at the right time ig.
ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n