New paper out with
@Zaddyzaddy
tldr: Security patches are also attack maps. Patch2Vuln asks whether an offline LLM agent can look only at old/new Linux binary packages, no source patch, no advisory text, and infer what vulnerability was fixed.
It builds a local pipeline around ELF extraction, Ghidra/Ghidriff binary diffing, changed-function ranking, dossier generation, and agentic audit/validation.
On 25 Ubuntu .deb package pairs, it found the correct security-relevant patched function in 10/20 real security updates and the accepted root-cause class in 11/20, while correctly treating all 5 negative controls as unknown.
The fascinating bit: this is basically post-patch vulnerability archaeology. It shows that once a binary security update ships, an agent can sometimes reconstruct the hidden bug from the patch artifact alone.
But the main bottleneck is not yet “LLM reasoning”; it is whether the binary diff/ranking stage surfaces the right function and whether local validation can turn the hypothesis into behavioral evidence.