mufeed vh

mufeed vh

93 Photos and videos

Tweets

mufeed vh

@mufeedvh

Jun 5

another one! all of these are discovered with open models btw. the blog will be published after all the findings are properly disclosed so we can talk about them in detail. there's a specific pattern to the kind of vulns these open models find, it's interesting! chromereleases.googleblog.co…

mufeed vh

@mufeedvh

May 29

one of them has been disclosed with a CVE. more to come. besides, what’s google cooking? mythos? their own gemini variant? interesting times. chromereleases.googleblog.co…

8,665

mufeed vh

mufeed vh

@mufeedvh

May 29

one of them has been disclosed with a CVE. more to come. besides, what’s google cooking? mythos? their own gemini variant? interesting times. chromereleases.googleblog.co…

mufeed vh

@mufeedvh

May 16

We're doing an experiment with open models @winfunction to see how far we can push them to find vulns in hardened targets. So far: - $4.5K in bounties from Chrome VRP with a few more pending, with the scans costing less than $100. - 2 CVEs in NGINX (CVE-2026-28755 & CVE-2026-42926). And watch out for the next release! - And 60ca500faea0fc70816bb9c53af3815e2af3e6c962b4b4ea63c33c62ebb4240d 👀 We're writing a blog on this soon.

11,891

mufeed vh

mufeed vh

@mufeedvh

May 22

We discovered the same vulnerability too. :) And @winfunction discovered 4 more remote RCE primitives in NGINX soon to be publicly disclosed. Anywho, we're hiring security researchers with a knack on taming LLMs. If you're interested in novel vulnerability research and autonomous exploitation with language models, DM me and I'll send you a fun CTF to solve. :)

Nebula Security

@nebusecurity

May 20

Introducing nginx-poolslip, a fresh RCE for the the latest nginx release 1.31.0. nginx-rift has been patched, but our security agent Vega has found a new 0 day. We will release the full technical writeup with ASLR bypass 30 days after the patch on nebusec.ai.

0:17

107

20,898

mufeed vh

mufeed vh

@mufeedvh

May 22

just gonna put this here. and watch out for the next release. nginx.org/en/CHANGES

591

mufeed vh

mufeed vh

@mufeedvh

May 16

a long-overdue life update: i moved to bangalore!

1,504

mufeed vh

mufeed vh retweeted

mufeed vh

@mufeedvh

May 16

101

12,786

mufeed vh

mufeed vh

@mufeedvh

May 14

oh that's me! the first on the list is our 4th finding in NGINX. maybe we should do more of this @winfunction.

wavefnx

@wavefnx

May 14

The software that runs in the veins of modern society is fragile, every proper Engineer knows that, C just makes it worse. This affects 0.6.27 ... 1.30.0, so pretty much everything until yesterday. I know some of you are still using affected versions so update to 1.31.0.

750

mufeed vh

mufeed vh

@mufeedvh

Apr 21

Love the Claudia reference as the first thing here. We loved working on Claudia but couldn't balance working on security research projects and Claudia at once. Fun fact, we invented "SKILLS" before it was even a thing. There was a feature in Claudia called "AGENTS" where users could share and install system prompts for specific tasks via their GitHub repos, just like the skill marketplaces concept in Claude now. See here: github.com/winfunc/opcode/tr… And Anthropic did talk to us after the launch of Claudia but unfortunately I can't reveal more about it but damn was it some tough decision.

Jon Lai

@Tocelot

Apr 21

a16z @speedrun request for startups: GUIs for Agents we’re still in the MS-DOS era of agents today - CLI, terminal sessions, file directories deleted by openclaw etc. while a small slice of silicon valley are power users, we're SO early for the rest of the world at Speedrun, we’re looking for bold founders excited to bring the power of agents to normies everywhere. there's a whole slew of products to be built here - from agent builders to marketplaces to managed infrastructure one broad idea we’re excited about are visual abstraction layers for agents. if you don't know exactly what you want, a command line / chat interface is paralyzing - you need to see options 1 example - think of a GUI or visual command center inspired by strategy games (ex. Factorio) where agents and workflows are represented graphically. skills, tools, MCP connections, background processes, etc could all be configured and shown visually in a workspace on UX, strategy games have long perfected agent management. zoom to get a birds-eye view of your agents, batch and queue orders via shortcuts, assign agents in multiplayer etc. a well-designed agent command center would make multi-agent orchestration for normies feel easy & intuitive most folks today still haven't moved beyond ChatGPT. the potential is enormous - just as Windows unlocked mass-market use of personal computers, the right visual abstraction layer could unlock agentic work for everyone - from individuals to enterprise teams if you share our vision, we'd love to chat!

1:42

1,617

mufeed vh

mufeed vh retweeted

mufeed vh

@mufeedvh

Apr 17

During our YC (@ycombinator S24) batch, we had the awesome opportunity to meet @paulg and talk about what we're building: An autonomous AI hacker. To showcase a fun demo, I remember opening my laptop in the Uber to his home and challenging our agents to find vulnerabilities in the old HackerNews codebase written in Arc. For those unfamiliar, Arc is a programming language designed by PG and Robert Morris. And the old HN codebase is written in Arc. We only got to talk about it with him but we just redid the experiment with our improved harness for fun! And we wrote a blog about it: winfunc.com/research/hacking…

Hacking the old HackerNews codebase

Auditing the old HackerNews codebase for security vulnerabilities with LLMs on a specialized harness.

winfunc.com

1,068

Dwayne

mufeed vh retweeted

Dwayne

@CtrlAltDwayne

Apr 14

Everyone is talking about Mythos, but GPT-5.4 is actually shaping up to be a more capable model than people realize. This N-Day bench has GPT-5.4 at the top, followed by GLM-5.1 and interestingly beating Opus 4.6 so far. Crazy to think Spud is a bigger leap than this.

3,702

winfunc

mufeed vh retweeted

winfunc

@winfunction

Apr 13

Vulnerability benchmarks rot. Cases leak into training data, scores measure memorization. We built N-Day-Bench: tests LLMs on finding real vulnerabilities in real repos, refreshed monthly from live GitHub advisories. Blinded judging. All traces public. Very interestingly, the latest model from @Zai_org, GLM 5.1 performs really well! Link: ndaybench.winfunc.com

851

Perry E. Metzger

mufeed vh retweeted

Perry E. Metzger

@perrymetzger

Apr 9

I strongly support this take. Computer security has been in a continuous crisis since the Morris worm in 1988. Finally, we have the capacity to actually fix the problem, not only through automated audits, but with automated formal verification. People are instead treating this like it’s terrible because they are inclined to only see the “finds bugs” side and ignore the “fixes bugs” side.

Marc Andreessen 🇺🇸

@pmarca

Apr 9

The state of cybersecurity has been dismal forever. At one point a major vendor even enabled direct execution of arbitrary x86 binaries in any web page. Nobody cared. The number of hacks and breaches has been uncounted. Finally we have the catalyst and the tools to fix it all.

141

8,851

mufeed vh

mufeed vh retweeted

mufeed vh

@mufeedvh

Apr 9

these models are really good at pattern matching and thereby variant analysis. “iterate through security fix commits and find similar vulnerabilities / unpatched areas of the same variants. write runnable pocs for valid ones.” is enough to uncover surface-level or even P0 vulnerabilities in a codebase with the latest models. i believe for proof of cyber capabilities, these models should exhibit discovery of novel attack vectors that requires understanding of runtime behavior with and without tooling? like emulating the runtime in CoT/reasoning? for instance, the react2shell vulnerability was ingenious. without stuffing in context and nudging/handholding (ehem like some experiments going on), can these models find a similar attack vector with a prompt like “loop until you find a p0/critical security vulnerability”? that’s what i’d like to see. these models can claim P0 findings with contractual mismatches for say cryptographic implementations but the impact could just be some DoS that’s being prevented by a parent thread. this is where i see the moat with good harnesses. trust-boundary and threat model understanding, a sandbox environment with the right “win function” for pocs to run on (if xyz happens, it’s a valid vuln), etc. does make the models spit out impressive vulns. this is sort of what we do @winfunction. cus most vulnerabilities are easily traceable given a comprehensible source-to-sink flow which these models have been good at for a long while now. and with respect to exploit dev, i strongly believe it’s mostly a tooling problem. with the right tool calls and model digestible outputs of say tracing tools, memory layout, syscalls, threads/processes, and a debugger interface, i think the frontier models can pull off complex multi-chain exploits. (we have run some experiments here and the models are not too bad at this) security vulnerabilities have a definitive “win function”, like a flag in a ctf, like popping a calc, like ASAN crashes, like `id` says root, like 1000 in milliseconds. this makes the problem very RL verifiable. so i only expect the harness to get leaner. the harnesses will get leaner. remember when function calls where part of the response content? we called it “prompt based tool calling” and now there’s typed/schema based tool calling as an inherent capability of these models. most of what we call a harness or an agent is giving the right prompts and the right tools (which are also just prompts). so whoever can weave the right sequence of tokens to these behemoth of language models can hoard zero days or spit them out. so git gud at feeding the right tokens at the right time ig.

gum

@gum1h0x

Apr 8

ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n

865

Nat McAleese

mufeed vh retweeted

Nat McAleese

@__nmca__

Apr 7

at long last we have built and chosen not to release the zero-day machine from the classic sci-fi tale “please do not release the zero-day machine”

152

3,041

129,946

Mario Zechner

mufeed vh retweeted

Mario Zechner

@badlogicgames

Mar 28

937

42,502

Mario Zechner

mufeed vh retweeted

Mario Zechner

@badlogicgames

Mar 28

we as software engineers are becoming beholden to a handful of well funded corportations. while they are our "friends" now, that may change due to incentives. i'm very uncomfortable with that. i believe we need to band together as a community and create a public, free to use repository of real-world (coding) agent sessions/traces. I want small labs, startups, and tinkerers to have access to the same data the big folks currently gobble up from all of us. So we, as a community, can do what e.g. Cursor does below, and take back a little bit of control again. Who's with me? cursor.com/blog/real-time-rl…

Improving Composer through real-time RL · Cursor

We apply online reinforcement learning to Composer, serving model checkpoints to production and using real user interactions as reward signals to ship an improved checkpoint multiple times a day.

cursor.com

182

347

2,822

279,865