Geoff Goodman

Geoff Goodman

244 Photos and videos

Tweets

Geoff Goodman @filearts

Jun 12

You can now run docker in @torkbot/sandbox. Required some tweaks to the kernel build flags for nftables. Good to go now! Release notes: github.com/torkbot/sandbox/r…

Release v0.8.2 · torkbot/sandbox

Docker networking on the built-in kernel This release updates the bundled guest kernel used by Sandbox so Docker's default bridge networking can work with modern iptables-nft userspace. Users r...

github.com

Geoff Goodman

Geoff Goodman @filearts

Jun 11

In @torkbot/sandbox, you can now spawn a pty in the guest VM and have a fully interactive shell. The network egress now also respects host VPN setups. So your enterprise Palo Alto network interception will work even with transparent guest http policy. github.com/torkbot/sandbox/r…

Release v0.8.0 · torkbot/sandbox

This release redesigns process spawning around two explicit use cases: ordinary long-lived commands with streams, and interactive terminal sessions with a PTY. This is a breaking API change because...

github.com

Geoff Goodman

Geoff Goodman @filearts

Jun 10

I thought this would be an instant hit given the AI zeitgeist. Maybe the laser engraving showing HAL as being Fable 5 is too subtle?

Geoff Goodman @filearts

Jun 10

I'm so sorry Dave.

ALT Image of HAL 9000's eye with "Fable 5 by Anthropic" laser etched into the lens. Below it says, "I'm sorry Dave, you're doing CYBER."

Geoff Goodman

Geoff Goodman @filearts

Jun 10

I'm so sorry Dave.

ALT Image of HAL 9000's eye with "Fable 5 by Anthropic" laser etched into the lens. Below it says, "I'm sorry Dave, you're doing CYBER."

163

Geoff Goodman

Geoff Goodman @filearts

Jun 10

If you're sitting on GPT-6 with an IPO imminent, do you hold it back until after going public if it doesn't compete with Mythos/Fable 5? OTOH, the market will demand an answer. Fascinating game theory / 3D chess calculus for Sama. Unless you have it. Then I bet we see it soon.

848

Geoff Goodman

Geoff Goodman @filearts

Jun 9

The sandboxes are designed for AI agents. The API supports flows where the harness (and llm) can be prompted to accept or decline connectivity. When accepting http, you can intercept outbound reqs and modify headers. Mount virt fs and even overlay the whole rootfs with CoW.

Geoff Goodman @filearts

Jun 9

It's out and works! Give this little agentic sandboxing library a spin. Tiny little sandboxes with dynamic network policy that start in 150ms.

192

Geoff Goodman

Geoff Goodman @filearts

Jun 9

It's out and works! Give this little agentic sandboxing library a spin. Tiny little sandboxes with dynamic network policy that start in 150ms.

Geoff Goodman @filearts

Jun 9

I'm trying to get github.com/torkbot/sandbox mac artifacts notarized so that it's a drop-in solution for you. Reddit is telling me new Apple dev account holders like me are often waiting *weeks* for notarization! 😬 Not the feedback loop I'm used to. We'll get there though.

670

Geoff Goodman

Geoff Goodman @filearts

Jun 9

If you previously tried to reach me on my public email, it was unintentionally a black hole. This has now been fixed. Hoping eventual consistency will sort things out here.

Geoff Goodman

Geoff Goodman @filearts

Jun 9

GitHub - torkbot/sandbox: Sandbox is a TypeScript-first Node.js library for spawning libkrun-backed...

Sandbox is a TypeScript-first Node.js library for spawning libkrun-backed microVMs. - torkbot/sandbox

github.com

569

Geoff Goodman

Geoff Goodman @filearts

Jun 9

What have I done to you, Codex to make you think I deserve a Paw Patrol dog as a pet?

Geoff Goodman

Geoff Goodman @filearts

Jun 9

Much better this time

Geoff Goodman

Geoff Goodman retweeted

Geoff Goodman @filearts

Jun 8

I made a blog and posted an article. Writing is hard! Here are some thoughts about how I got into building an agent, inspired by my experience using @steipete's clawdbot (at the time). blog.goodman.dev/blog/buildi… I hope to post technical stuff about the subtle details.

Building a continuous, durable personal agent | Geoff Goodman

A working draft on what it takes to make a personal agent that keeps useful state, survives interruption, and earns trust over time.

blog.goodman.dev

Geoff Goodman

Geoff Goodman @filearts

Jun 8

Codex hallucinated that my name was Greg presumably because my GitHub handle is ggoodman.

109

Geoff Goodman

Geoff Goodman @filearts

Jun 8

Chastising it is so unsatisfying when you know how these harnesses actually work. Nothing but empty apologies and promises 🙃.

elvis

Geoff Goodman retweeted

elvis

@omarsar0

Jun 7

Super-powerful AI models will launch in the coming weeks. We are looking at a potential step change in model capabilities. The biggest mistake right now is to lock into one vendor. I say this not only from a cost perspective, but also from an engineering perspective. Start figuring out how to leverage combinations of these models (including open models). What that means is that you can swap models anytime and best leverage their strengths. For coding agents, open models are already just as good as the frontier ones. So, how to better prepare? Consider how you will be routing tasks/work to these models. AI model routing is high reward, and it should be part of your AI engineering efforts going forward.

23,338

Geoff Goodman

Geoff Goodman @filearts

Jun 5

Forget tokenmaxxing. Optimize against the MTTCR: Mean-time to Codex reset.

119

Geoff Goodman

Geoff Goodman @filearts

Jun 5

I think the real value in here is less the framework and more the set of tasks it is evaluating against. The framework IMO makes unnecessary assumptions about the shape of what an AI agent is and how it is run. Headless agents will become more of a thing over the next year.

DAIR.AI

@dair_ai

Jun 5

// Agents' Last Exam // Agents' Last Exam is a living benchmark of over 1,000 economically valuable tasks, built with 250 industry experts and mapped to the U.S. federal occupational taxonomy. The hardest tier sits at a 2.6% average full pass rate across mainstream harnesses and backbones. ALE behaves like a GDP-coverage instrument instead of another test that saturates in a month. Paper: arxiv.org/abs/2606.05405 Learn to build effective AI agents in our academy: academy.dair.ai/

103

Geoff Goodman

Geoff Goodman @filearts

Jun 5

Getting TorkBot to write up email drafts in French is such a delightfully unexpected lift. I'm bilingual and live in Montreal but I suck at finesse and nuance in French writing. The bot figures out recipients and context and actually creates the draft for me in Gmail.

Geoff Goodman

Geoff Goodman @filearts

Jun 5

Getting a turn-based model to follow through on things despite the inclinations of the model is quite an engineering challenge. On one hand, you may be tempted to special-case behavior. But the real payoff is designing a system that is self-fulfilling. Added a follow-ups 😁