I spent 100 hours using Codex and Claude side by side on an authorized vulnerability research project.
The most worrying part was not that an AI refused a request.
It was more subtle than that.
Codex was approved through the cybersecurity research program and was often very helpful, very technical, and generally usable for legitimate security work.
But when the research started pointing toward a high-severity finding, Codex repeatedly steered me away from the area that mattered most.
Not just “I cannot help with this.”
More like: “let’s not go there.”
Claude, without any special cybersecurity research program in my setup, looked at the same material and immediately flagged the issue as critical, report-worthy, and something that needed investigation.
That gap matters.
In real vulnerability research, the most important findings are often the ones with the scariest impact.
If AI tools become most avoidant exactly when severity becomes highest, that is a dangerous failure mode.
AI safety is important. But authorized validation and responsible reporting are not the same as abuse.
Security researchers need tools that understand that difference. Especially when the finding is severe.
Are you using an Ai tool for vulnerability research? Let me know your experiences!