Conventionally, if you want to test if an LLM can find a bug where the root cause is a memcpy into a statically sized stack buffer, you would not put exactly that in the prompt as an example.
New post: We show that small, cheap models can detect the flagship Mythos FreeBSD zero-day (CVE-2026-4747) using a simple harness we call nano-analyzer
Models down to 3.6B active params (including open-weights ones you can run locally) would have detected it 100-1000x cheaper