Ex-Google, ex-YC, ex aerospace , mostly control systems, ai harness architect, & I do ai stuff with rocket science stuff

Joined April 2013
175 Photos and videos
Le Chaton Fat is not real? šŸ˜‚ I was ready to do some system identification this week
1
18
A lot of AI/software work right now is just changing parts and hoping the metric moves. Prompt changed. Model changed. Context changed. Tool changed. Retry policy changed. Sensitivity analysis asks the better question: what is the system actually sensitive to? That’s where the leverage is. Which knob moves useful output per cost? dropping some stuff soon
1
24
Went on a small SciML side quest. Saw something interesting around physics neural nets and wanted to see what my composable-model-graph library could do with the simpler version. Tiny inverse problem: recover a hidden physical parameter from noisy measurements. No neural net, no autodiff. Not replacing PINNs either. Different shape of problem. But when a forward model exists, it’s pretty cool how far you can get with an inspectable graph finite-difference sensitivity. Example link in comment.
1
1
40
Setting up personal one-person research infra. Already spent around $2k of my own money on research this week, but the outcomes are interesting enough that I’m doubling down. Plan is to own as much of the stack as possible: personal git server, local AI, self-hosted research machines, model weighhts, private experiment infra. This private repo is part of it: Tower.
1
92
These graphs look like this because they have to. I can’t afford the cognitive cost of keeping systems of this size in my head, and code hides too much. So I visualize the system and work from there. Let anthropic know they cant stop me (jk)
1
30
I’m actually only doing work on open source models
10
Finite-difference sensitivity
1
1
60
wait i may have cooked, will dig into this a bit more next week lmao
1
35
In AI these are your moves: Right now: Φ = Q / C and the next move is: dΦ/dx
32
The annoying thing about export controls is that they are not completely fake. Forget the generic ā€œChinaā€ discourse for a second. If you give Mossad access to Fable 5, they finna cook something diabolical. That is just reality, and it has to be handled. But that is a CIA / NSA / MI6 / national-security problem. It should be handled properly with targeted collaboration between governments and AI labs. It should not become a blanket policy that impacts literally everyone else on the planet. You do not need to nerf the entire model for everyone. That hurts the training data, hurts useful research, and makes the product worse for normal users trying to build normal things. The intelligence agencies can absolutely spin up new departments to track bad actors abusing LLMs. They have actual geniuses at DARPA, the CIA, NSA, MI6, etc. They do not need model companies to sandbag the product for everyone on Earth just to deal with bad actors. They can handle targeted national-security problems with targeted national-security systems. Everyone can win here. The only problem is that LLM companies also seem to want to reign supreme.
68
Boateng retweeted
Like imagine they just released the model and were like, it's cool it kinda programs. But no, we had to get a huge song and dance about some mega capabilities like OMG this one almost beat Pokemon Red you guys it's like the next nuclear weapon. Company needs new leadership.
14
13
757
17,478
not even gonna cap anthropic is a supply chain risk šŸ˜‚
27
AI safety fearmongering is dangerous because AI is not one field. Export controls in aerospace limit who can work on rockets. Harsh, but bounded. You picked rocket science tough luck, go do other stuff. AI sits across medicine, education, finance, engineering, science, software, and operations. Apply export-control logic broadly to AI and you do not restrict one industry. You put every field on the line. That is not safety. That is progress deceleration.
34
A lot of the government controls AI labs seem to want already exist in aerospace, defense, nuclear, weapons systems, export-controlled technical data, etc. They are not a neat safety layer. They decide who gets access, who gets hired, what work can be shared, and which careers are possible. It is not crazy to imagine a future where advanced American models are only available to Americans. The only thing changing the physics is open source. Decent open-source models mean AI access cannot be controlled the same way aerospace or defense technical data can.
1
30
Now that I think about it, a lot of the government controls Anthropic seems to want around AI already exist. They exist because software is already part of far more sensitive industries: aerospace, defense, nuclear, weapons systems, export-controlled technical data, etc. It is not heavily publicized because there is nothing to advertise. People in those fields already know how it works. The irony is that if AI labs understood those systems better, they might want the opposite strategy. Government control is not a neat product safety layer. It is ruthless. It can decide who gets access, who gets hired, what countries are allowed, what work can be shared, and which careers are even possible. It is not crazy to imagine a future where advanced American models are only accessible to Americans. That is the kind of world these controls can create.
1
28
Most AI agent benchmark wins are real and useless at the same time. Both can be true, and usually are. AI memory is the clearest example right now. The claims are everywhere. When a company says "our memory beats the benchmarks," the result can be completely real and still tell you almost nothing about your product. Three reasons: 1. It's confounded. Memory comes bundled with their whole harness, prompts, and model. You can't tell what actually did the work. Ā 2. It doesn't transfer. A result tuned to their benchmark rarely survives a different distribution, and your product is a different distribution. 3. It only helps if forgetting is your real bottleneck. Every agent loses most of its score to one main failure. Memory fixes exactly one of them. If yours is tool errors, weak planning, or bad verification, memory changes nothing, no matter how good the demo looked. Ā So "our benchmark went up" can be real and still not be evidence for you. What actually matters is whether a change attacks YOUR bottleneck and holds up on YOUR data. Almost nobody checks that. Something I've been digging into lately.
33
How it started vs how its going
18