Joined November 2017
1,868 Photos and videos
Pinned Tweet
Boris Cherny: "Overnight sub-agents do deeper work" 10-step Fable 5 setup you can copy this week: 1. Write CLAUDE.md - stack - commands - code style - forbidden files - review rules 2. Add PROJECT_MEMORY.md - verified facts - failed attempts - last session - next run 3. Create 1 Skill per repeated workflow - CI triage - PR review - design QA - deploy check 4. Add eval cases Put them in: eval/<workflow>.jsonl 5. Split maker and verifier Maker writes the chang Verifier runs the app, tests, screenshots, logs 6. Use worktrees for parallel runs No shared checkout No file collisions No mystery edits 7. Send work by price Fable 5 plans across days Sonnet 4.6 does bulk edits Haiku 4.5 grades Opus 4.8 handles fallback cases 8. Put UI work behind screenshots If the task is visual, text logs aren't enough 9. Move long jobs to Routines CI failed? Run triage PR opened? Run review 7am? Send digest 10. End every run by writing the lesson back A fix that stays in chat dies there The rule: > Builder makes the change > Verifier checks the real artifact > Memory keeps the receipt > You read the diff Skip the verifier and you don't have an agent system You have a very confident intern with shell access
34
84
700
133,986
Harry Tandy retweeted
Sam Altman: "The cost of AI will converge to the cost of energy" Me realizing the AI subscription question is turning into a power bill question > $200/mo for ChatGPT Pro > $100-$200/mo for Claude Max > rate limits during the exact week I need long runs > private docs moving through someone else's servers > then a mini PC shows up with 128GB unified memory and a 235B-class MoE in Q2/Q3 GGUF range The detail I care about: Qwen3-235B-A22B is 235B total, 22B active BF16 is around 470GB Q2/Q3 GGUF builds are around 86-112GB That is the whole buying test: > can it handle your repeat work? > drafts > summaries > file search > code review passes > private document cleanup Let the desk box eat the repeat work Pay the cloud when the answer quality has to beat everything else
5
1
11
1,339
Jensen Huang: "the programming language is human" Claude Code gets better when your English becomes job design One messy session asks the model to: > write code > test code > review code > remember scope > decide priority That turns into a polite generalist with too many hats The stronger setup is 4 tiny contracts: > writer ships code > tester attacks the spec > reviewer attacks the diff > coach writes the brief and calls the order Tool limits do real work here > the reviewer has no Write > the tester starts from the spec > the writer runs the build but doesn't grade itself > the coach collects reports and waits before commit Copy the roster into: `.claude/agents/` `.claude/commands/ship.md` Then make `/ship <task>` your feature-work entry point Better agents start with stricter job descriptions
6
6
11
648
Harry Tandy retweeted
Karpathy compressed the agent problem into 1 sentence: "If you can't evaluate then you can't auto research it, right?" That's the rule I keep coming back to with long-running coding agents Before you launch /goal or /loop, write the verifier: - what counts as done - what evidence proves it - which checks run every pass - which artifact gets saved - which failure sends it back into the loop Then let the agent run The loop can keep going because proof sits outside the agent's own explanation Tests, screenshots, benchmark curves, browser runs, changed files That's how you get autonomy without babysitting a transcript for 6 hours Read the full breakdown on goals, verifiers, loops, artifacts, and session memory in the article below
33
15
185
20,919
June 12, 5:21pm ET: Anthropic says it received a US export-control directive on Fable 5 and Mythos 5 By Saturday, the practical result was brutal: every customer loses access, because the order covers any foreign national, inside or outside the US, including Anthropic employees If your product depends on one frontier model, audit 4 things today: 1. fallback model for every workflow 2. logs showing which model answered each request 3. customer messaging for forced downgrades 4. contract language for government or export-control shutdowns Anthropic says other Claude models are still available The next failure mode for AI apps is simple: the model can work perfectly and still disappear from your stack overnight
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…
10
46
5,432
Sam Altman: "The cost to use a given level of AI falls about 10x every 12 months" 10-step Fable 5 sprint you can run before access changes: 1. Pick the heaviest backlog item - migration - refactor - research sprint - design rebuild - knowledge base 2. Create FABLE_RUN.md - goal - files in scope - commands - review rules - done criteria 3. Map the repo first Ask for entry points, shared utils, risky modules, tests 4. Break the job into checkpoints Each checkpoint ends with: > diff > test output > next decision 5. Split builder and checker Builder edits Checker runs the app, reads logs, takes screenshots 6. Use worktrees for parallel attempts Each run gets its own checkout and its own notes 7. Keep a RUN_LOG.md Every failed command goes there Every accepted fix goes there 8. Put frontend behind references 2-3 screenshots Real CSS when precision matters 9. Save research as files > claims.md > sources.md > checker-notes.md > decision.md 10. End with PROJECT_MEMORY.md The next run should start smarter than the first one The rule: > big model plans the route > workers take lanes > checker attacks the result > memory keeps the receipt > you read the diff Skip the checker and you bought confidence, not work
10
8
62
19,631
Boris Cherny: "My job is to write loops." 8-day version you can copy for the Fable Bloom Postiz content loop Day 1: baseline - pick 3 hooks - post once a day across X, Instagram, LinkedIn - save the Sunday report before touching anything Day 2: brand home - onboard the website or Instagram into Bloom - check logo, palette, typography, references - generate 5 test images before you automate Day 3: always-on agent - run Hermes on a VPS - keep it off your laptop - use SSH, not the browser console Day 4: model - set Claude Fable 5 as the brain - give it the content goal - make it return a calendar, not loose drafts Day 5: Bloom MCP > hermes mcp add bloom --url trybloom.ai/api/mcp then install the Bloom skill into the directory Hermes actually scans Day 6: Postiz > npm install -g postiz > postiz auth:login > postiz integrations:list If integrations:list does not print your accounts, the agent is just writing into air Day 7: volume - 20 variants - top 3 hooks - each tagged by concept - native caption per platform Day 8: kill list - bottom half dies - top 2 concepts get 20 new variants - top 3 creatives go to paid after 30 days The rule: every publishing loop needs a brand check and a numbers report Fable writes Bloom keeps the asset recognizable Postiz ships You read the concept table Skip Bloom and you get 40 polished posts that look like they escaped from a template marketplace
4
3
9
891
Sam Altman: "Think more about what to work on" 10-day version you can copy for the Fable app studio playbook Day 1: market crawl - send agents across App Store, Reddit, TikTok, Google - return 100 ideas with ratings, keywords, price, MVP scope Day 2: pick one profession - electrician - HVAC - nurse - mortgage broker - contractor Day 3: write the PRD - all screens - all formulas - all edge cases - App Store copy - review prompt timing Day 4: build one Flutter template - calculator engine - IAP flow - onboarding - settings - 44pt tap targets Day 5: ship app #1 - 3 free calculators - $9.99 full suite - no login - offline-first Day 6: verify the math - NEC tables for electricians - dosage checks for nurses - DSCR formulas for investors - flag anything uncertain for human review Day 7: App Store assets - 30-char name - 30-char subtitle - 100-char keywords - 4,000-char description - 6 screenshots Day 8: ASO check Search the keyword yourself If the top app has under 500 ratings, the niche is open enough to test Day 9: clone the template Same codebase New profession New formulas New metadata Day 10: morning review Fable builds overnight You read the diff You test the calculator You submit The rule: every app gets a formula verifier before App Store Connect Skip that and you don't have an app studio You have 17 clean-looking calculators giving tradespeople bad math
11
16
61
10,832
Andrej Karpathy: "The hottest new programming language is English" English gets you the first tool call the 9 rules for agents that work in 2026: 1. write one finishable goal. ban vague tasks like "help me with research" 2. give it tools: web search, files, code, APIs, database, email 3. build the ReAct loop: think -> act -> observe -> retry 4. start with 50 lines of Python before adding LangGraph or CrewAI 5. save memory after every major step: completed, decisions, current result, next step 6. make it verify work before stopping: run code, check files, compare against the goal 7. add a critic agent for judgment calls: weak pain, unclear buyer, crowded market 8. cap the run: max 10 steps, max 3 retries, 60 second timeout 9. ask a human before customer-facing or irreversible actions the weekend build is the loop the career skill is knowing which loop deserves tools, memory, and a critic
9
8
25
3,975
A $300/YEAR AI NOTE STACK CAN BECOME A 5-MINUTE OBSIDIAN SETUP This article is worth saving because the setup is painfully concrete: Obsidian Copilot plugin Kimi API Here are the 10 parts worth stealing: 1. Obsidian is the home base. Local Markdown files. Free app. Your notes stay as files you can move, back up, search, sync, or open years later 2. Copilot is the bridge. Install the plugin, open settings, add a Custom Model, paste the Kimi API endpoint and key, then set Kimi as the default chat model 3. The first workflow is Vault QA. Ask "What did I conclude about X?" and it searches your own notes before answering 4. File citations matter. A vault answer with source notes beats a clean paragraph you can't trace 5. Long context is why this works. Kimi K2.6 supports 256K context. Moonshot V1 lists 131,072 tokens 6. Summaries should land inside the note. A Quick Command can turn a 2,000-word meeting dump into 5 bullets, action items, and one reason to keep it 7. Connections come from asking the vault, not buying another "AI connections" feature 8. Cleanup happens after capture. Dump the messy note fast, then use Composer to add headers, bullets, tasks, and tags 9. Web pages, PDFs, and YouTube links should land in a Sources folder as searchable notes 10. Content drafts get better when they pull from your own notes instead of generic internet memory Cost gets attention The author says ~$300/year of subscriptions became roughly ~$40/year of API usage File ownership is why the setup keeps working Try this first: > install Copilot > add Kimi as a custom OpenAI-format model > index the vault > ask one question you already know the answer to > check the cited files If it finds your past thinking, keep going If it misses, fix indexing before adding more apps
9
15
5,349
AI SOFTWARE DEVELOPMENT NO LONGER REQUIRES A DEV TEAM Most people use AI as a chatbot for short code snippets The release of Claude 5 Fable changes this - one person can build and sell complete software products in days Here's the technical breakdown and the math behind it 1. Stripe migrated a 50-million-line Ruby codebase in 1 day. A full engineering team would need 2 months for the same task 2. The model launches parallel subagents automatically. You do not need to ask for this in the prompt. While one agent writes the code, a second tests it, and a third searches for optimizations simultaneously 3. Persistent memory saves to a plain Markdown file between sessions. Claude 5 Fable remembers previous errors and successful fixes after you close the tab. In tests on Slay the Spire, file-based memory improved performance three times more than Opus 4.8 4. Vision features allow the model to rebuild web application source code from screenshots alone. It beat Pokémon FireRed using raw screenshots without maps or navigation aids 5. Prompt instructions like "show your thinking" or "repeat your reasoning" degrade performance. They trigger an automatic fallback to the older Opus 4.8 model. Write your instructions directly 6. A single CLAUDE.md file in your project root eliminates the need to paste context at the start of every session. The model reads it automatically to grasp the tech stack, project rules, and constraints 7. Building business automation scripts now takes 2 to 3 hours. Freelancers charge between $500 and $2,000 for these Python scripts because the model handles the writing, testing, and debugging 8 Developing a browser game with a monthly subscription takes one weekend. Using Phaser, React, and TypeScript alongside Stripe integration lets you set up automated billing and daily free-tier limits 9. Cloning the official Anthropic skills repository gives the model instant access to task-specific instruction files. You can also create custom skill files for recurring tasks like e-commerce layout design What drives results: > One CLAUDE.md file per repository > No "show your thinking" prompt lines > Component testing before moving to the next step > Automated parallel subagents for code verification The project math: > Cost: $20 monthly subscription > Your time: 2 days of describing and reviewing > Client price: $3,000 to $5,000 per website Free access ends June 22 before moving to token-based pricing Save this if you are building actual products
8
13
2,200
Anthropic is gatekeeping its strongest model from the publ Claude Fable 5 is live, outperforming previous versions on long tasks: > MMLU-Pro: 78.4% > SWE-bench Verified: 53.8% > GPQA Science: 65.2% Public users get the censored version, which triggers an Opus 4.8 fallback in 5% of sessions The raw version, Mythos 5, is restricted to Glasswing partners
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
6
14
1,132
BUSINESS OWNERS ARE WASTING HOURS ON MANUAL PROMPTS Most business owners use AI to write faster emails. They miss the framework shift This guy broke down the 4 levels of Claude mastery: 1. Level 1 uses Claude as a writing assistant. You type a prompt, read the output, and close the tab. Every session starts from zero without your business context 2. Level 2 connects live business data. Model Context Protocol (MCP) links Claude to Google Drive, Notion, and Slack. The model reads client briefs and brand guidelines before writing 3. Level 3 introduces autonomous execution. Anthropic Cowork runs on your desktop to complete multi-step tasks. It opens local files, structures data, and builds entire slide decks while you watch 4. Level 4 builds permanent infrastructure. Claude Code automates repeating workflows to run without human input. Outreach, reporting, and client onboarding execute on a fixed schedule 5. Context isolation limits basic prompts. Copying text into a blank window forces u to re-explain your brand voice every time. This habit disappears at Level 2 6. Agentic tools handle format changes. Cowork alters local file structures and populates reports from live datasets. It replaces the manual steps of knowledge work 7. Infrastructure decouples time from output. Level 4 automation means your client onboarding triggers the moment a contract is signed, running entirely in the background 8. Moving up requires changing your mental model. True leverage comes from building systems that execute tasks, not finding better ways to write prompts The architecture that works: > MCP links for live context > Desktop agents for multi-step tasks > Terminal tools for scheduled automation > Systems that operate without you Save this blueprint if you are building actual pipelines
16
4
19
2,385
Boris Cherny: "I don't prompt Claude anymore. I have loops running that prompt Claude and figure out what to do. My job is to write loops" Most developers still treat AI like a text box. They miss the architectural shift Here is the 9-part blueprint for building autonomous loops: 1. Loop engineering consumes tokens fast. A single coding loop uses 50K to 200K tokens 2. DeepSeek V4 makes these feedback cycles affordable. Low token pricing removes the primary financial barrier for autonomous workflows 3. Long-running loops require a 1M context window. The system needs to keep codebase context and current errors in memory simultaneously 4. Git worktrees stop file collisions. Isolating each agent in its own working directory allows parallel execution without chaos 5. Save project rules inside the repository. Files like ARCHITECTURE.md give agents context before they start writing code 6. Keep the maker away from the checker. The model that builds the code misses its own errors, so a separate subagent must verify the output 7. Connect loops to your actual environment. Use Model Context Protocol (MCP) to let agents read issue trackers and open pull requests 8. Prioritize closed loops over open loops. Bounded steps and explicit stop conditions protect your budget from runaway token spend 9. Move loop memory outside the chat thread. Persistent markdown files track what previous runs already tried The architecture that works: > Git worktrees for parallel runs > Persistent markdown files for memory > Strict maker-checker separation > Low-cost token infrastructure Save this blueprint if you are building autonomous pipelines, not just chatting with a bot
14
4
44
8,223
AI OBSERVABILITY IS BROKEN. YOUR AGENT HARNESS SHOULD REPAIR ITSELF Most engineering teams install a dashboard, get a clean tree of model calls, and think their production debugging loop is automated It isn't. The dashboard tells you what broke, then leaves the actual fixing to your engineering time. This guy broke down the open-source architecture closing this loop from trace to patch Here are the 11 infrastructure rules worth stealing: 1. Traces without automated root-cause analysis are just expensive log files. The platform must explain the causal chain, not just show the error 2. True agent debugging requires code-level context. The tool must read your local source files to pinpoint the exact broken lines 3. The output should be a code diff, not a text hint. The system generates the precise code modification and waits for your approval 4. Manual regression testing fails at scale. Approved patches must instantly turn the original failing input into a permanent regression test 5. Numerical evaluation metrics fail in production. Replace abstract floats with plain-English assertions that check explicit business logic 6. Build test suites from live production failures, not synthetic data. Let real edge cases harden your evaluation layer automatically 7. Prompt playgrounds solve the wrong problem. Validating an agent requires an execution sandbox that runs the entire graph end-to-end 8. Sandboxes must live outside of git. This allows non-technical team members to test prompts and models without breaking code 9. Instrument early via unified runtime decorators. Track every tool call and retrieval step against the active agent configuration 10. Route fixes through versioned blueprints. Never deploy directly; transition verified sandbox changes safely to staging and production 11. Tooling sprawl destroys context. Tracing, evaluations, sandboxing, and testing must live in one flywheel, not separate platforms What the architecture executes: > Automated tracing and root-cause diagnostics > Automated source code diff generation > Plain-English evaluation assertions > End-to-end graph sandbox execution > Instant production-to-regression test pipelines Observability that ends at the dashboard made sense when agents were simple chat bots Production pipelines require tools that run the repair loop for you Save this if you are building self-correcting agent infrastructure
8
2
32
2,754
THE $2T IPO FILING IS A STORY FINANCED UP FRONT The mistake most retail investors make: they buy the brand name on day one and assume the underlying financials match the hype It falls apart when you look at the actual S-1 data I ran SpaceX's 300-page IPO filing through a deep cross-section analysis Here are the 11 disclosures worth knowing before June 12: 1. The $2T valuation reflects 100x current revenue. Core rocket operations account for less than 7% of the total addressable market used to justify this price 2. The remaining 93% of the valuation relies on enterprise AI. The prospectus asserts a $26.5T market but lacks a clear strategy to take market share from tech incumbents 3. The xAI merger turned a profitable rocket company into a loss-making entity, shifting financials from a $791M profit to a $4.94B net loss 4. AI infrastructure operations lose an average of $2.5B per quarter. Capital expenditure requirements indicate these losses will continue through the near term 5. Starlink subscriber volume doubled to 10.3M, but average revenue per user dropped 23%. Volume growth is driven by lower-priced international expansions 6. Retail allocation is set at 30% of the float - three times the institutional standard. This provides immediate liquidity to early venture capitalists exiting at the top 7. The insider lockup allows rapid liquidation. Approximately 20% unlocks near day 60, followed by bi-weekly releases that free up 93% of insider shares by November 8. Individual shareholders hold zero voting power. A dual-class structure concentrates 85% of total voting control within a 42% insider equity position 9. The governance framework mandates individual arbitration and bans class-action lawsuits. Shareholders cannot pursue collective legal recourse if capital allocation decisions destroy value 10. The filing leads with mission objectives over operational metrics. The first five pages focus on multi-planetary expansion before presenting standard balance sheets 11. The current valuation requires the unproven AI segment to generate revenue equal to global cloud infrastructure spend within 48 months What the data shows: > 100x revenue pricing multiples > $2.5B recurring quarterly infrastructure spend > 23% reduction in satellite broadband margins > 30% float distribution to retail accounts > 93% insider share unlock within 5 months > 15% aggregate public voting power Modern public markets regularly price decades of projected execution into opening-day valuations SpaceX represents the absolute limit of that trend Save this before the June 12 listing
7
1
22
9,574
THE SMARTER THE MODEL, THE MORE TOKENS IT BURNS ON A BAD BACKEND Moving an agent to a smarter model often increases token usage When context is missing, a capable model reasons harder, runs more queries, and retries often I analyzed a Claude Code build across two backends to see where the money actually goes Here are the 11 parts worth stealing: 1. Traditional backends are built for humans. When an agent takes over, missing context and vague errors turn into active token costs 2. Heavy tool surfaces inflate context. Connecting an all-in-one MCP server dumps 50 unused tool definitions into the window before coding even begins 3. Fragmented state tracking compounds costs. Forcing an agent to run separate commands for projects and schemas creates an expensive discovery loop 4. Schemaless databases force retries. Without a declared shape, an agent infers fields from sample documents, risking broken assumptions later 5. Opaque error strings destroy efficiency. Generic errors force the agent to guess, deploy a tentative fix, and re-send the entire growing context 6. Apply context engineering to the backend. Tools and system state are part of the context window. Structured backend data stops the exploration loop 7. Use progressive disclosure for knowledge. Load lightweight metadata at session start, and pull full docs only when a task matches 8. Enforce structured CLI outputs. Infrastructure commands must return raw JSON and semantic exit codes so agents can handle errors programmatically 9. Use MCP tools only for dynamic state. A single tool returning a clean topology map keeps the model's environment footprint small 10. Route models through a unified gateway. Forcing an agent to configure separate API keys for distinct providers creates unnecessary setup loops 11. File reopens drive session costs. The true cost metric is how many times an already-written file is reopened to fix unexpected backend behaviors What actually compounds: > single-call topology maps > progressive skill disclosure > uniform JSON CLI responses > unified model gateways > zero app-code reopens An agent writing against a complete, structured metadata picture commits code once and moves on No late information means no file reopens, keeping the conversation short and the token bill low Save this for your next agent build
19
21
2,087
AI AGENTS DON’T SHIP PRODUCTION CODE. HARNESSES DO A raw model is just raw processing power. To make it useful, you build the environment around it I broke down the infrastructure systems used by OpenAI, Anthropic, and ThoughtWorks to ship millions of lines of code Here are the 11 parts worth stealing: 1. The harness is everything that isn't the model - the constraints, feedback loops, state tracking, and tool permissions 2. The model functions like a CPU, the context window is RAM, and the harness is the operating system managing memory, tasks, and rules 3. Cutting 80% of an agent's tools often yields better output by eliminating choice paralysis and context contamination 4. Place AGENT.md or CLAUDE.md files in your repo folders. The agent reads them at session start to learn localized architecture and tasks 5. In long autonomous runs, agents easily overwrite loose text files. A strict JSON schema keeps the pipeline stable 6. Every run must execute the same routine: verify directories, read git logs, parse the tracker, start servers, and run tests 7. Have a generator agent propose a code plan, and a separate evaluator agent review it before any implementation begins 8. The harness must scan the actual file tree first to provide real paths and symbols, stopping the agent from inventing APIs 9. A single agent instance cannot critically evaluate its own work. Use separate instances and browser automation to test outputs 10. Agents trying to fix multiple issues at once drop requirements. The loop must be: pick one feature, implement, test, commit, stop 11. Do not maintain a separate wiki. If a design constraint or utility function isn't in the repo files, the agent will miss it What actually compounds: > one tracking file per module > strict boot sequences > isolated evaluation agents > explicit code contracts > dropping unused tools Harness components decay as models improve Build every constraint to be modular, test them by turning them off, and delete them the moment they become overhead Save this if you are building actual development pipelines, not just chatting with a bot
23
6
43
7,178
CLAUDE IS MORE THAN A CHATBOT, BUT 95% OF USERS MISS IT Most people use 20% of the platform’s potential They ask a question, get an answer, and leave The real power is hidden in the settings and toggles that bridge the gap between "fancy chat" and a production-grade workspace Here are the 9 features to turn on right now: 1. Custom Styles Train Claude on your writing samples to stop the "AI-generic" voice. Select it before you write to match your tone automatically 2. Memory Management Claude stores context over time. Open the memory panel to delete outdated info or manually add your project goals 3. Projects Create isolated workspaces with their own files and instructions. This keeps your context clean and prevents "re-explaining" every morning 4. Published Artifacts Turn code or UI into a live link. Share interactive calculators or dashboards with people who don't even have a Claude account 5. Extended Thinking A toggle for hard problems. Use it for complex logic, math, or debugging when reasoning quality matters more than speed 6. Connectors Link your Google Drive or Calendar directly. This lets Claude pull docs and check schedules instead of you copy-pasting data 7. Past Chat Reference Enable the search toggle to let Claude pull facts from your history. No more scrolling to find that "one good idea" from last week 8. Mini-App Building Use Artifacts to build functional tools like habit trackers or CRMs. Describe the tool, and Claude builds the interactive interface 9. Context Optimization Don't dump raw data. Use specific spec files and markdown summaries in your Projects to keep the model focused and sharp The reality of AI productivity: > stop fighting the default voice > prune your memory logs weekly > build tools, don't just ask questions > use Projects to silo different work > thinking mode is for logic, not for chat You don't need a better model You need to turn on the features you're already paying for Save this and spend 10 minutes in your settings menu today
6
25
2,637