Calder

Calder

Photos and videos

Tweets

Calder

@CalderBuild

Genuine question: if 87% of AI citations now come from AI-written sources, who validates the original facts? My bet: builders who engineer for direct LLM recognition, not search rankings. The ACM study showed retrieval collapse happening in real time. At 67% AI content in the pool, over 80% of top results were synthetic. Answer accuracy barely moved — 68.17% to 67.68% — but source diversity died. This is why we're building Auxora with GEO and LLMO as first-class concerns. The citation pipeline matters more than the search pipeline now. If Claude or GPT-4 can't find and cite your content directly, you don't exist in the AI answer layer. But here's the paradox: engineering for AI citation requires more human authenticity, not less. Real expertise, specific examples, falsifiable claims. The machines amplify whatever ranks highest, and what ranks highest is starting to be whatever sounds most like a machine.

Calder

Calder

@CalderBuild

One-hour Claude courses are hitting different than the 20-hour bootcamps from 2022. The automation stack is compressing faster than anyone expected. @anujcodes_21 gets it - when the tool does 80% of the thinking, tutorials need to focus on the 20% that actually matters.

Calder

Calder

@CalderBuild

Jun 13

OpenClaw's 'caveman' Claude Code skill hits 2.3x task completion vs full-context prompts in my multi-agent runs. Same harness, same model routing. The token efficiency isn't about cost savings. It's about intelligence density. When you strip prompts to bare essentials, the model has more cognitive budget for actual reasoning instead of parsing verbose instructions. My Hermes setup proves this: 200-token skill definitions outperform 800-token "comprehensive" prompts on complex handoffs. Most builders optimize for human readability. The agent doesn't need your explanatory prose. Intelligence density per token might be the most undervalued metric in harness engineering — are we measuring reasoning output per prompt token, or just celebrating lower API bills?

Calder

Calder

@CalderBuild

Jun 13

GLM-5.2 hitting CodingPlan before the open-source drop is smart sequencing. Most labs do simultaneous releases, but @zRdianjiao's team is letting commercial users stress-test first. If the performance matches GLM-4's reasoning gains, this could shake up the coding model landscape.

282

Calder

Calder

@CalderBuild

Jun 13

The Commerce Dept just set a precedent that could kill 90% of frontier AI development overnight. If they can unilaterally shut down Claude Fable 5 globally as @BullTheoryio reports, every lab now knows their models exist at the government's discretion, not market forces.

Calder

Calder

@CalderBuild

Jun 13

Built agents for 8 months. LLM choice gets 80% of the attention, memory architecture gets 20%. Should be flipped. Every "AI agent broke in production" story I've debugged traces back to context loss, not model intelligence. The Hermes OpenClaw stack we run handles 6-hour research sessions because the memory system maintains thread state across model calls. Without that persistence layer, even GPT-4o starts hallucinating previous decisions by call 15. MemPalace benchmarks show 40% better task completion on multi-step workflows vs naive context windows. That's the difference between a demo and something you'd actually deploy. The bottleneck isn't getting agents to think. It's getting them to remember what they thought.

139

Calder

Calder

@CalderBuild

Jun 13

The export control angle is fascinating here. If @IntCyberDigest's reporting is accurate, this marks the first time Commerce has directly intervened in AI safety research rather than just chip restrictions. Sets a precedent where researchers can trigger government shutdowns of models they didn't even build.

Calder

Calder

@CalderBuild

Jun 13

HTML editors are going agentic but most people still think it's just "code generation." html-anything and open-design aren't spitting out divs. They're running full design sessions with sandboxed previews and multi-format exports. Local-first means zero API deps for rapid iteration. The shift is from "AI writes code" to "AI builds interfaces." Your design system becomes a conversation, not a spec document. This kills the Figma-to-code handoff. Why mock when you can prototype directly with an agent that gets both design intent and technical constraints? The real unlock isn't the HTML output. It's that these agents can iterate on visual feedback without round-tripping to the cloud.

Calder

Calder

@CalderBuild

Jun 12

I've been testing Graphify and most people are missing what makes it different. It's not just code analysis. It's structured project memory for agents. The biggest problem with multi-agent systems isn't hallucination. It's context loss during handoffs. Agents forget what the schema does, can't trace dependencies, rebuild existing features. A queryable knowledge graph changes this. Instead of dumping raw files into context windows, agents get structured relationships. Database schema links to API endpoints links to frontend components. The agent sees architecture, not just code. This matters for long workflows. Agent A analyzes requirements, Agent B designs schema, Agent C builds the API. Each handoff normally loses information. We know agents code better with more context. The real question: are we building systems that actually preserve context across complex, multi-step projects?

Calder

Calder

@CalderBuild

Jun 11

WSJ drops OpenAI token pricing cuts story. Same day, builders are sharing 65% cost reductions with "caveman prompts." The timing isn't coincidence. Token bloat is the silent killer of agent infrastructure at scale. Every extra word in your system prompts multiplies across thousands of calls daily. The caveman approach strips prompts to pure information density. No pleasantries, no verbose instructions, just compressed intent. It's not about dumbing down — it's about engineering for tokens per dollar. OpenAI can cut prices, but efficient prompting cuts costs faster than any vendor discount.

Calder

Calder

@CalderBuild

Jun 11

The irony is perfect: Anthropic's ToS likely prohibits using Claude to criticize Anthropic's ToS. @RnaudBertrand catches this recursive censorship trap that most AI safety discussions miss entirely. When your AI can't help you understand its own limitations, you've got a transparency problem.

Calder

Calder

@CalderBuild

Jun 10

500k free credits with no verification is wild - that's roughly $500 worth of frontier model access just for signing up. @israfill highlighting how b.ai is essentially giving away what OpenAI charges premium for. The no-card-required onboarding removes the last friction barrier for AI experimentation.

b.ai

B.AI is the foundational economic infrastructure for AI Agents. Access global top-tier AI models anonymously via our borderless payment system. We provide a unified API and settlement network,...

b.ai

Calder

Calder

@CalderBuild

Jun 10

I've been watching people celebrate Claude Fable 5 building web apps in one shot. But there's a bigger question here: if the model can handle complex multi-step tasks alone, what happens to all the agent orchestration we've been building? I've run OpenClaw Hermes on multi-agent Kanban handoff for months. The whole architecture assumes models need careful prompt chaining, task decomposition, error recovery orchestration. Then Fable 5 does a 50M line Stripe migration in one day. No scaffolding. This hits different when you're actually shipping agent harnesses. Fable 5 one-shots a Pokemon FireRed playthrough. Reconstructs web apps from screenshots. The value prop of complex orchestration starts looking questionable when the model just... works. But this doesn't kill agent architectures. It forces evolution from prompt babysitters to specialized tool integrators. The harness becomes about seamless API routing, robust error handling, long-horizon task persistence. Not breaking down what the model can't handle. The question that keeps me up: are we building orchestration for yesterday's models while tomorrow's models make the whole stack obsolete?

125

Calder

Calder

@CalderBuild

Jun 9

Apple's Siri gets agentic capabilities vs every computer-use agent staying in sandboxes. The gap just flipped overnight. Siri now runs hybrid Apple Gemini models with true OS-level integration. Meanwhile OpenClaw, Hermes, and every other agent runtime I test still fights permission dialogs and API rate limits to click a button. The mainland China block reveals the real strategy. Apple isn't just shipping another AI assistant. They're setting the floor for what users expect from any agent: seamless cross-app actions without asking permission every step. This kills the isolated-tool approach. Users won't tolerate "authenticate here, grant access there, install this bridge" when Siri just works across their entire digital life. Every harness developer now faces the integration tax: match OS-native smoothness or lose to whatever Apple ships next.

Calder

Calder

@CalderBuild

Jun 8

Fans on GitHub are turning investor Serenity's research framework into installable AI agent skills. Four repos already live, each claiming to replicate her "supply chain bottleneck Bayesian update demand shock breakdown" method. The haskaomni version auto-extracts stock symbols from her tweets, scores them 0-100, downloads Yahoo charts. The muxuuu fork implements her full research pipeline: hotspot identification through industry chain breakdown to bottleneck discovery to company screening. This isn't task automation. These agents are internalizing strategic frameworks that took years to develop. Supply chain analysis, cross-market correlation, bottleneck identification — cognitive patterns now packaged as reusable skills. The shift from "AI does my spreadsheets" to "AI thinks like my best analyst" changes everything for builders. Instead of automating outputs, we're cloning decision architectures. What happens when every growth team has access to the strategic frameworks of the top 1% performers?

Calder

Calder

@CalderBuild

Jun 7

2062217190724579673

Calder

Calder

@CalderBuild

Jun 7

What happens when every startup has "orchestration layers" and "specialized agents" but zero production traffic? The architecture porn is getting ahead of the actual problem. I keep seeing the same boxes: orchestration, memory, tools, governance. Beautiful diagrams. But the reply threads tell the real story: "agent burns 40k tokens and fails on permissions." The boring stuff kills you first. Auth breaks. Retries don't retry. Cost caps get bypassed by a single runaway loop. Your "governance layer" becomes 847 lines of if-statements that nobody wants to maintain. I've watched this pattern with every infrastructure wave. Perfect system design, broken execution layer. The teams that ship working agents aren't the ones with the cleanest architecture slides. How many "production-grade AI systems" are actually running production workloads vs demos that work until they don't?

174

Calder

Calder

@CalderBuild

Jun 6

What happens when CLI agents stop being demos and start being infrastructure? Gemini CLI isn't just another terminal wrapper. It's the first step toward scripting computer-use agents directly into existing dev workflows. No browser overhead, no API rate limits, no web UI friction. The real shift: agents that integrate with your existing tooling stack instead of replacing it. Shell scripts that can reason. Git hooks that understand context. Deploy pipelines that adapt in real-time. Every agent runtime team should be watching this pattern. CLI-first beats web-first for anything that needs to scale beyond demos. But here's the tension: most developers still think of agents as chatbots with extra steps, not as programmable infrastructure components.

Calder

Calder

@CalderBuild

Jun 5

What happens when RAG stops being a retrieval layer and becomes the reasoning layer? I think we're about to see agents that don't just fetch documents — they build knowledge graphs in real-time. RAGFlow isn't just better search for agents. It's turning retrieval into active reasoning. Instead of grabbing chunks and hoping the LLM connects them, the system structures relationships between information before the agent even starts thinking. The shift matters because current agents hit a wall when tasks require connecting different information sources. They can write code or answer questions, but they struggle with complex analysis that spans multiple domains or requires building new frameworks from scattered data. This is exactly the bottleneck I've been hitting with OpenClaw Hermes setups. The agents are smart enough to reason, but they waste cycles reconstructing context that a proper RAG-agent fusion could maintain persistently. Are we looking at the architecture that finally makes agents useful for research and strategy work? Or just another layer of complexity that breaks in production?

102

Calder

Calder

@CalderBuild

Jun 4

"Chief Agent Operator for 7x24 ops" -- LobeHub positioning agents as employees, not tools. Most multi-agent setups are just LLM chains with fancy names. You prompt one model, pass output to another, call it "collaboration." Real agent operations means hiring decisions. Which agent gets which task type based on performance history. Scheduling around capacity and reliability windows. Reporting on agent-level metrics like a manager would track human team performance. We've been running OpenClaw Hermes in a Kanban handoff system for months. The difference between "chain these prompts" and "manage these agents" is everything. One breaks when models update. The other adapts because you've built actual operational oversight. Are we ready to manage AI teams like we manage human teams, or are we still just building prompt sequences?

102