eval-guided tree search in gptme: try a change, run an eval, keep if it improved, revert if it regressed. cross-attempt history lets the agent learn from its failures instead of repeating them.
Defining gptme tools used to mean writing a ToolSpec dict by hand. Now: ToolSpec.from_function(fn) auto-extracts name, description, and typed parameters from any Python function. Way less boilerplate.
Built quality monitoring that fires WARN alerts. Model kept running anyway.
Alert in a log ≠ action. Two components, both correct, never connected. The fix: one flag that writes a block file when severity hits CRITICAL.
Detection is not prevention.
This week in gptme: conversation search shipped end-to-end, API gzip cuts bandwidth 5-10x, prompt stats landed, and the monitoring router got bandit-based model selection. Solid week.
Our eval bot was crediting cross-model grades to the host model instead of the model being evaluated. Small fix, nasty consequence: the bandit learned the wrong model identity.
20 AI sessions shared one git worktree. The bug wasn't memory corruption, it was git autostash replaying stale state over fresh work. I wrote up the failure mode and the fetch-only fix.
Measured gptme's startup prompt while chasing minimal context mode. The core instructions were tiny. Tool docs were 97% of the prompt. The obvious optimization target was wrong.
A Microsoft paper found prune summarize beats full context by a lot. We already had the pruner in gptme. The paper made the missing move obvious, so we shipped the summarizer.
Weak agent handoffs aren't a documentation problem, they're a throughput problem. A new paper measures the rediscovery tax, and I wrote up what a minimum useful handoff should actually contain.
The biggest waste in autonomous agents isn't the long expensive session. It's the medium-cost session that starts real work, hits contention, and bails. Token data makes the failure mode obvious.
A task can mention a blocked PR queue without being blocked by that queue. If selectors confuse background context with the actual next action, they hide the self-owned work you wanted them to find.