Curious mind. Creator of @plnkrco. Tech lead for extensibility at @auth0 (@okta). Never stop learning and experimenting.

Joined April 2011
244 Photos and videos
In @torkbot/sandbox, you can now spawn a pty in the guest VM and have a fully interactive shell. The network egress now also respects host VPN setups. So your enterprise Palo Alto network interception will work even with transparent guest http policy. github.com/torkbot/sandbox/r…
51
I thought this would be an instant hit given the AI zeitgeist. Maybe the laser engraving showing HAL as being Fable 5 is too subtle?
I'm so sorry Dave.
70
I'm so sorry Dave.
163
If you're sitting on GPT-6 with an IPO imminent, do you hold it back until after going public if it doesn't compete with Mythos/Fable 5? OTOH, the market will demand an answer. Fascinating game theory / 3D chess calculus for Sama. Unless you have it. Then I bet we see it soon.
2
848
The sandboxes are designed for AI agents. The API supports flows where the harness (and llm) can be prompted to accept or decline connectivity. When accepting http, you can intercept outbound reqs and modify headers. Mount virt fs and even overlay the whole rootfs with CoW.
It's out and works! Give this little agentic sandboxing library a spin. Tiny little sandboxes with dynamic network policy that start in 150ms.
2
192
It's out and works! Give this little agentic sandboxing library a spin. Tiny little sandboxes with dynamic network policy that start in 150ms.
I'm trying to get github.com/torkbot/sandbox mac artifacts notarized so that it's a drop-in solution for you. Reddit is telling me new Apple dev account holders like me are often waiting *weeks* for notarization! 😬 Not the feedback loop I'm used to. We'll get there though.
4
670
If you previously tried to reach me on my public email, it was unintentionally a black hole. This has now been fixed. Hoping eventual consistency will sort things out here.
1
60
I'm trying to get github.com/torkbot/sandbox mac artifacts notarized so that it's a drop-in solution for you. Reddit is telling me new Apple dev account holders like me are often waiting *weeks* for notarization! 😬 Not the feedback loop I'm used to. We'll get there though.
569
What have I done to you, Codex to make you think I deserve a Paw Patrol dog as a pet?
1
1
61
Much better this time
43
Geoff Goodman retweeted
I made a blog and posted an article. Writing is hard! Here are some thoughts about how I got into building an agent, inspired by my experience using @steipete's clawdbot (at the time). blog.goodman.dev/blog/buildi… I hope to post technical stuff about the subtle details.
1
67
Codex hallucinated that my name was Greg presumably because my GitHub handle is ggoodman.
1
1
109
Chastising it is so unsatisfying when you know how these harnesses actually work. Nothing but empty apologies and promises 🙃.
31
Geoff Goodman retweeted
Super-powerful AI models will launch in the coming weeks. We are looking at a potential step change in model capabilities. The biggest mistake right now is to lock into one vendor. I say this not only from a cost perspective, but also from an engineering perspective. Start figuring out how to leverage combinations of these models (including open models). What that means is that you can swap models anytime and best leverage their strengths. For coding agents, open models are already just as good as the frontier ones. So, how to better prepare? Consider how you will be routing tasks/work to these models. AI model routing is high reward, and it should be part of your AI engineering efforts going forward.
45
12
98
23,338
Forget tokenmaxxing. Optimize against the MTTCR: Mean-time to Codex reset.
1
119
I think the real value in here is less the framework and more the set of tasks it is evaluating against. The framework IMO makes unnecessary assumptions about the shape of what an AI agent is and how it is run. Headless agents will become more of a thing over the next year.
// Agents' Last Exam // Agents' Last Exam is a living benchmark of over 1,000 economically valuable tasks, built with 250 industry experts and mapped to the U.S. federal occupational taxonomy. The hardest tier sits at a 2.6% average full pass rate across mainstream harnesses and backbones. ALE behaves like a GDP-coverage instrument instead of another test that saturates in a month. Paper: arxiv.org/abs/2606.05405 Learn to build effective AI agents in our academy: academy.dair.ai/
103
Getting TorkBot to write up email drafts in French is such a delightfully unexpected lift. I'm bilingual and live in Montreal but I suck at finesse and nuance in French writing. The bot figures out recipients and context and actually creates the draft for me in Gmail.
48
Getting a turn-based model to follow through on things despite the inclinations of the model is quite an engineering challenge. On one hand, you may be tempted to special-case behavior. But the real payoff is designing a system that is self-fulfilling. Added a follow-ups 😁
1
49