Principal Investigator | AIxCC Lead Architect (views my own)

Joined May 2010
79 Photos and videos
Pinned Tweet

1
3
23,530
Fable 5 applying forced token maxing by converting to a Python strategy over RTK. I've personally found Python generation far less reliable and reliable than Golang, Rust, or TypeScript. I think this is primarily because of the strong typing, but perhaps even moreso the abundance of opinionated community content about the "right" way to implement something that likely went into training.
1
3
76
Forget "token maxing", I've derived this new agentic concept of "outcome maxing". This novel concept is where you focus on producing real world results, embrace stakeholder empathy, and maximize value by scaling desired outcomes for your customers.
56
The details of AIxCC challenges are now live at archive.aicyberchallenge.com… Huge thanks to all the challenge authors who contributed, and the teams who helped review content.
45
If you haven’t re-evaluated delivery services lately for Instacart, Uber, Door Dash, it’s probably a good time to do so. Seeing extreme price increases. 36% just for the item, then nearly 2-3x for delivery from 1/4 mile away. 4x if you take the max suggested tip.
1
1
143
generated company-profile.md
32
Does anyone know if there are any models specifically trained on data before GPT-3 was widely available? I know Sam has mentioned that GPT-4 could potentially be this but I think we need a site and a model like "before[.]ai" or similar that is specifically dedicated to continual training on ONLY datasets validated to be available before GPT-3 or some other clear line in the sand where we know models we're generating a significant volume of data on the internet. If we don't do this as some sort of public good, open-source, shared offering I think attempting to baseline understanding and separate raw human content from model generated content is going to be impossible. This is not about generating a large extremely capable model for free, but about human vs. AI/ML provenance. This seems like the sort of thing an academic institution or standards agency like NIST, etc. could help maintain with the support of large model vendors @sama @DarioAmodei @elonmusk
1
1
74
@huggingface might be an interesting endeavor for you also?
31
Matt Lehman retweeted
New podcast 🎙️ How did the AI Cyber Challenge go from skepticism to success? Start with AIxCC Part 1 – From Skepticism to Success and hear how #AIxCC reshaped thinking around AI cybersecurity. Part 1 kicks off a 4-episode series: hubs.la/Q042tjVx0
2
5
626
If you're not checking your assumptions with each new LLM feature/model release you're missing out. The biggest two consistent human gaps I see are outdated assumptions, and inability to effectively configure your agentic environment.
1
53
I think 2026 will be the year VCs start releasing tranches based on your tokens-to-feature ratio, making funding decisions on your avg tokens-to-launch, or moving from pre-seed, series-A, to series-B growth stage only.
46
I’ve been experimenting with these extended memory agentic frameworks with the knowledge of what it takes to build a global scale systems. I’m finding them amazing for experimenting and prototyping but the debugging skills severely lacking. They appear to always weight training data over supplied evidence.
1
58
Probably 8-10 times now I’ve had it implement a spec or configuration based on guidance from 1-2 years ago and when provided the official latest spec it still doesn’t conform. I think there is a disconnect in valuing latest vendor/author supplied information over the volume of data available during training.
1
50
I’m curious if anyone has found a good way to guide a skill around this hurdle.
37
The real risk begins when Clawdbot aka Moltbot aka OpenClaw translates and labels the 312 episodes of Brooks Moore narrating “How It’s Made”.
1
1
86
🚨 CAUGHT: My utility charged me for 834 kWh over 5.4 hours. 200A @ 240V = 48 kW (48 kWh/hr max) Over 5.4 hrs = 258 kWh max possible They charged 834 kWh = 3.2x IMPOSSIBLE! Could charge 7.7 Teslas! $102.61 bogus. gridsense.us #UtilityBilling

1
69
The funny part about this is I vibe coded all of gridsense.us while I was on hold trying to resolve this.
48