ai for security

Joined October 2009
258 Photos and videos
Pinned Tweet
1/ Agentic LLMs can automate vuln detection. Very exciting, but doesn't address the hardest part (imo) of vuln research: prioritization. Can we reliably explore the search space and separate signal from noise? I wrote a paper (and OSS tool) to solve this. arxiv.org/pdf/2512.06155
2
60
217
104,630
getting closer to frontier capability @ home. I am personally running ds4 on Framework Desktop (AMD Strix Halo). > Full SWE-Bench Verified score [for 2-bit quant DeepSeek-V4-Flash] is between 67.5–85%. > The headline SWE-Bench Verified score for DeepSeek-V4-Flash is 80.8% for full-precision version. > It is incredibly impressive that the version of the same model having some layers quantized down to 2 bits still performs comparatively well. > To put it in a perspective, Claude 4.5 Opus scores 76.8% according to the official leaderboard.
That's why people using DS4F with DwarfStart, 2 bit quantized, are often surprised by the results. It's not a frontier model but it is not a toy, it is something you can actually use to get work done, and nobody can tell you want to do with it.
5
720
beware cognitive surrender
1
5
242
straight from the horse's mouse
CISA will soon release a directive pushing agencies to stop treating every cyber vuln as equally urgent, acting director Nick Andersen said. “If we try to say that everything is equally as important, then absolutely nothing’s going to be important.” nextgov.com/cybersecurity/20…
1
5
1,112
I'm here for "Claude Noir" (N-hour) as the offsec-focused version of Claude > “N-day” has become dangerously misleading. N-hour is closer to the reality we now operate in. red.anthropic.com/2026/n-day…
1
4
19
2,044
synergy through the roof in this ds4/amd community collab github.com/antirez/ds4/issue…
Replying to @antirez @AMD
using 26.04 on framework desktop without issue. latest experimental rocm optimizations on github.com/antirez/ds4/issue… are improving prefill/gen tps by 1.5–2X.
2
955
tldr: trim the fat
I strive to make my writing unsummarizable, in the sense that it has so little fluff left in it that if you take any words out, as summaries by definition do, you lose a lot of interesting ideas.
1
4
1,512
(this was a joak)
1
750
Caleb Gross retweeted
Brevity is more than politeness to the reader. Compression is understanding.
137
200
2,933
149,863
framework makes the dramework 💪
I just received a Framework Desktop (Strix Halo) courtesy of @AMD in order to merge and continue the development in DwarfStar of the ROCm support (currently community handled). What Linux distro should I install? Ubuntu 24.04.4 LTS which is officially supported, or Fedora 43?
1
433
This concern makes sense if the medium doesn't matter. Perhaps we should question our desire for raw, unmediated information.
“AI is demoralizing.” A Princeton Professor says he kept wondering this semester (while lecturing) if his students would be better off learning from Claude:
1
3
1,017
Is that all a professor is—someone who "knows the most"? Is that all education is—information acquisition?
1
4
348
thanks for the signal boost @antirez :) first time occupying the top spot on hacker news front page.
7
5
220
74,816
marked safe from the earthquake in Hawaii 🤙
1
2
1,323
Caleb Gross retweeted
My MLX Vulkan backend just passed both CPP AND Python test suites!!
3
5
60
28,421
betting that GitHub's "Download ZIP" button gets _way_ more clicks since the advent of LLMs. I very frequently drop entire zipped codebases in front of ChatGPT.
5
532
Caleb Gross retweeted
Earlier today Cloudflare's CSO shared how they tested Anthropic Mythos using an unreleased 8-stage vulnerability-discovery agent. So I asked Opus to implement the agent for me, it works via Claude SDK with a Pro or Max subscription, no API. Enjoy github.com/evilsocket/audit
13
103
561
47,809
Caleb Gross retweeted
Replying to @halvarflake
I find most computer security people have a grounded notion on AI because they have practical experience with things like fuzzing and z3 and see things as search. Search is powerful, but the space is bounded, and even within a space it can be just hard. AI is mainstream search.
4
1
21
1,111
Caleb Gross retweeted
buy a gpu. 3090, 4090, dgx spark, whatever fits your budget. tier doesn't matter. running your first local model does. the moment your first prompt lands with no api between you and the model, your brain rewires. that single moment is worth more than every take you'll ever read on a timeline.
68
48
644
30,734
a benefit of working from home: I can dictate a stream-of-consciousness rant into my mic while iterating on ideas with Claude. not sure how in-office folks handle this.
2
339