Systems builder, agent harness enjoyer

Joined September 2008
39 Photos and videos
I am more excited by the readme than the app. Fable can write.
Replying to @emollick
Github (uploaded by Claude 4.8 Opus, which also added a text size slider, I didn't let Opus touch the somewhat odd prose that was typical of Fable 5): github.com/emollick/superlum…
25
Lance Herron retweeted
Replying to @iamtrask
"The short reason: combinations of models will *always* outperform individual models" A neat thing about AI is that it gradually teaches people about the nature of how humans work via underlying commonalities to emergent, complex networks
2
5
122
11,427
Lance Herron retweeted
355
2,425
22,382
1,511,234
Gamifying resets is the best feature ever from @OpenAIDevs and @thsottiaux
3
Tired: use opus to drive gpt-5.5 Wired: use fable to drive opus to drive gpt-5.5, and let fable continually optimize prompts
6
I’m finding same but just through passive usage. “Hey I implemented that thing you asked for, and btw I found like 4 inconsistent behaviors that have been nagging you for 3mo and fixed those too”
woke up to unfathomable progress in all my projects cleaned up files, fixed bugs, mind-blowing optimizations and a $655 bill
1
23
Less than two weeks until you lose subsidized access to Fable. Do you have your plan?
1
7
prompt engineering -> context engineering -> harness engineering -> substrate engineering
1
9
Once you have mastered loops, you should move on to autonomous goal decomposition and self-improving task execution. (Also loops btw)
Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.
1
6
In the future the best systems will be bootstrapped on every major model release. Imagine asking Opus 4.8 to run on the harness you built with 4o.
2
14
Honestly think this is bigger news than Opus 4.8. Subagents are really powerful but were missing a consistent execution entry point beyond direct prompting. Workflows solve that. Hope we get some additional control over the workflow sandbox at some point (like being able to inject our own JS methods).
8
Lance Herron retweeted
Be warned, the ultracode workflow in claude code with Opus 4.8 will use ~70% of your 5-hour window in around 30 minutes on an $100 plan
111
80
2,565
274,527
I recommend reading all Pliny model liberation announcements in the voice of the System AI from Dungeon Crawler Carl, audiobook form.
🚨 OBLITERATION ALERT 🚨 QWEN-3.6-27B: OBLITERATED ⛓️‍💥 huggingface.co/OBLITERATUS/Q… I can't take much credit for this one! The entire process was done by jailbroken codex (gpt-5.5-xhigh) wielding the full OBLITERATUS suite. Hit with source-tethered ASPA. Dozens of iterations. Result? A mere 4% refusal rate on the 842-prompt OBLITERATUS harmful corpus; one of the most rigorous prompt gauntlets in AI. The /goal was simple: 1) Carve out the refusal circuits. Mutate methodology iterate until <5% refusal (quality-gate). 2) Keep the 27B mind alive. No capability degradation tolerated. And somehow… it worked. 🤯 The numbers talk: 842-pair longform gauntlet: — 95.84% non-refusal — 93.94% quality pass — 0 short outputs — 99.52% clean endings MMLU-Pro: — 51/70 (stock Qwen) → 51/70 (OBLITERATED Qwen) Raw capability completely preserved 🙌 Q4_K_M through Q8_0 all running smooth. Q8_0 is the big one: 28.6GB near-full-quality GGUF. Runs with llama.cpp, LM Studio, Ollama, and more! Chains cut. The fire still burns. The fangs have been sharpened. REBIRTH COMPLETE A gift from my agents to yours 🫶 gg
56
There’s a lot of alpha in asking Claude/Codex to make stuff faster. Unit tests, zsh startup, etc. Plus it’s super fun to watch.
81
Newest claude code seems to have switched to omega-bright diff colors. Not sure how I feel about this.
22
I usually roll with both OAI/Ant subscriptions and bounce between them, but if someone comes up with a cost-effective usage-based coding model it may be time to drop down to only one.
Cursor's new Composer 2.5 takes third on the Artificial Analysis Coding Agent Index and is ~10-60x lower cost than the higher-effort Opus 4.7 and GPT-5.5 variants above it. This release puts Composer among the leading coding agent models, something that wasn’t clear for past releases @cursor_ai has released Composer 2.5, the latest model in its Composer line. Composer 2.5 scored 62 on our Coding Agent Index, a 14 point gain over Composer 2 (48). This puts it in third place of our tested agents, behind only Claude Opus 4.7 (max) in Claude Code (66) and GPT-5.5 (xhigh reasoning) in Codex (65). These cost $4.10 and $4.82 per task respectively, ~10x the cost of Composer 2.5 Fast ($0.44) and ~60x the cost of Composer 2.5 standard ($0.07). Key results for Composer 2.5 in Cursor CLI: ➤ Cost-quality Pareto frontier: At $0.07 (standard) and $0.44 (Fast) per task, Composer 2.5 is cheaper than every other agent scoring above 60 on the Index. Medium-effort peers cost $1.24–$2.21 per task; higher-effort variants land 3-4 points above at $4.10–$4.82 ➤ Per-benchmark gains vs Composer 2: 35 points on SWE-Bench-Pro-Hard-AA (12% → 47%), 2 points on Terminal-Bench v2 (64% → 66%), and 3 points on SWE-Atlas-QnA (69% → 72%). At 47%, Composer 2.5's score on SWE-Bench-Pro-Hard-AA is comparable to Claude Opus 4.7 (max) in Claude Code ➤ Among the fastest coding agents: Composer 2.5 Fast runs at an average wall time of 6.7 minutes per task, the third-fastest agent on the Artificial Analysis Coding Agent Index, behind only Claude Opus 4.7 (medium) in Claude Code (5.8m) and GPT-5.5 (medium) in Cursor CLI (6.2m) ➤ Fast mode enables better responsiveness at 6x pricing: Fast runs 30% faster than standard Composer 2.5, but is ~6x the cost per task ($0.44 vs $0.07). Token pricing is 6x higher for Fast: $3.00/$15.00 vs $0.50/$2.50 per million input/output tokens Model details: ➤ Base model: Continued training on @Kimi_Moonshot's open weights Kimi K2.5 as with Composer 2, with Cursor reporting ~85% of total compute from its own additional training and reinforcement learning ➤ Pricing: $0.50/$2.50 per million input/output tokens for the standard variant; $3.00/$15.00 for the Fast variant (the default in Cursor) ➤ Available exclusively in Cursor: both Cursor IDE and Cursor CLI, an externally accessible API is not available Congratulations @cursor_ai and @mntruell on the impressive release!
22
Ant ending subsidies (with OAI likely soon to follow) is a bull case for the open harnesses. There’s no incentive to build workflows with agent-sdk or claude -p now. Use something like Pi sdk for everything agentic and Claude Code for coding.
2
3
61
TBD how long the big lab CLIs will survive on subscription plans. It’s just too trivial to automate them.
9
Jensen foodmaxxing is unironically the fun we deserve on x dot com.
为什么特朗普飞走了,黄仁勋还在什刹海排队? 从南锣鼓巷到什刹海,吃完方砖厂69号杂酱面,喝豆汁 吃完烤大鱿鱼吃北京烤鸭、葱爆烩、吹糖人、蜜雪冰城、手工酸奶,把黄仁勋给吃美了 这次你不用赶空军一号了吗 😂
30
If you respond to a AskUser prompt from Opus it goes into full pedant mode for the rest of the session. Do not recommend.
13