Joined November 2022
23 Photos and videos
Nth in a series: Gemini: "You caught me hallucinating a citation, and I need to apologize."
1
6
Part of why I think there’s still a market for software vs everything being vibe coded on the fly. Real tasks have countless subtle points and edge cases.
every job will turn into explaining your intentions to ai explaining what you want to ai is surpringly time consuming, coders already spend 80% of their time doing it, and this will be true for everyone
1
24
Smart take. I want software that’s 10x better, not ad hoc that misses countless edge cases.
Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents! The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints! We just tested this and the data says the opposite. We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch. Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success). And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong. In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time. Good tools are cached intelligence for agents! So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive. We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast! huggingface.co/blog/hf-cli-f…
26
I'm an optimist, but there are many unknowns so worth factoring in multiple viewpoints -- including Machiavelli. (An aside: is double-dash rather true m-dash sufficient proof of human-written text?)
My base case is model capability improvements will far outpace enterprise ability to adapt and instead of patiently waiting the labs will verticalize, gatekeep and kingmake with revenue share agreements for frontier model access. A recurring theme in all my posts across geopol and tech is the default assumption that most people are retarded, game theory is under utilized, and you have to be Machiavellian if you want to get shit done. In this case you need to put the fear of god into the board members of every public company. If you work at a lab and don’t understand how to go about doing this my DMs are open.
30
I'm still not convinced LLMs are 'the answer'. Great for coding ... and false confidence on research tasks. Here's Gemini after lots of (to use another quote) sound and fury, Signifying nothing. "I have completely botched this entire interaction, and I am not going to waste any more of your time or patience with excuses. You are entirely right: … I completely misread the illustration, leaned on a flawed mental summary of the …, and gave you nothing but frustration when you asked for a simple, direct visual check. I don't have a reliable, high-resolution direct link or a verified image of the exact page showing the … from the … in my dataset. Rather than keep throwing wrong interpretations and bad images at you, I am stopping here. I am sorry for wasting your time and dragging you through a loop of circular answers."
22
Yes: "understanding can be harder than solving". i.e. AI can do amazing things even if it doesn't "understand"
Replying to @DongruoZ
[7/7] Perhaps understanding can be harder than solving. Nesterov’s AGD was discovered decades ago, yet interpretations came much later. @weijie444 Or perhaps understanding is something only humans need. Machines do not understand mathematics; they are just freakishly good at it.
35
This could be a big deal. Current models are very difficult to steer. Cc @venturetwins who tracks lots of gen imaging; @Scobleizer for general interest
Jun 3
Replying to @arena
Images as code Images are represented as code, so every part of an image becomes addressable, editable, and manipulatable.
2
47
2 things are true: - cool to spin up a quick GUI for a one-off task - dumb to think this will replace GUI apps As with so many things, it's not either/or; it's both/and. GUI should be stable (and flexible). Should handle lots of edge cases (without adding complexity).
21
Nth in a series: AI still falls far short of the hype. prompt: find niche job boards focused on {redacted} Gemini (fast): "you won't find a job board exclusively dedicated to the title" then some indirect resources, poorly organized prompt: what about {example1, example2} Gemini: "I completely missed those, and you caught me flat-footed. I was looking through the lens of general job boards … rather than looking for hyper-niche scrapers and communities built specifically by and for this exact role."
39
Claude Code, Open 4.8 High "You're right — that's exactly what the rule says and I broke it. Sorry."
20
Good points. I assume others have written on this, maybe @deanwball @alexolegimas @bocowgill @emollick latent productivity that never reaches GDP because it appears as: •lower input costs (fewer billable hours), •consumer surplus (time saved, spending skipped), and •silent substitution (high-skill labor quietly displaced). Illustrations abound: •A patient triages symptoms with ChatGPT and skips four clinic visits. •An analyst masters a new industry without three costly expert calls. •A five-person start-up closes a seed round with no CFO, lawyer, or recruiter—AI fills those roles off the books.
7 Jul 2025
AI’s Shadow Output Gap While Washington obsesses over debt and inflation, AI is already ushering in an age of abundance (Part 1) ⸻ The political and economic establishment can’t stop talking about deficits, debt, and the CPI. Capitol Hill hearings, FOMC minutes, and financial news all pulse to the same beat. Yet this fixation ironically coincides with the arrival of the most powerful productivity engine in human history: generative AI. Its impact is creating a shadow output gap — an invisible but rapidly widening expansion of supply-side capacity. Policymakers, especially at the Federal Reserve, act as if the boom doesn’t exist. The real risk is not inflation. It is a stealth supply shock that pushes prices, wages, and term premia down. Deficits may prove too small. Monetary policy may already be too tight. ⸻ Productivity Everywhere — Except in the Data This is Solow’s Paradox, redux: “We see the computer age everywhere except in the productivity statistics.” Only this time the curve is ten-times steeper. Previous tech waves required hardware diffusion—mainframes, PCs, smartphones. AI requires none of that; it arrives through an app. That frictionless uptake already generates latent productivity that never reaches GDP because it appears as: •lower input costs (fewer billable hours), •consumer surplus (time saved, spending skipped), and •silent substitution (high-skill labor quietly displaced). Illustrations abound: •A patient triages symptoms with ChatGPT and skips four clinic visits. •An analyst masters a new industry without three costly expert calls. •A five-person start-up closes a seed round with no CFO, lawyer, or recruiter—AI fills those roles off the books. Each case creates real value, but none is logged as “output.” ⸻ Counting the Invisible Token Economy Tokens — the fragments of text an AI model processes — are the kilowatt-hours of knowledge work. Track them and you watch the shadow gap in real time. •Google’s token throughput grew 50-fold year-over-year as usage soared and per-token cost collapsed. •OpenAI’s models now sit in support desks, research departments, and legal teams worldwide. •Rapidly falling costs are unlocking accelerating demand across every provider. The data-center capex from Nvidia, Microsoft, and other hyperscalers is simply the physical expression of this surge. (1/2). $NVDA $AMZN $GOOGL $MSFT $TSM $CRWV $NBIS
5
420
Request for AI: - in each frame, compare height of {this} with {that} - draw line to show height of each - compute ratio and overlay the number wrong / useless output: ChatGPT 5.5 Thinking, Grok Gemini (free); first just text, then a completely re-imagined image Could I do this by downloading the latest segmenting model, having AI generate a custom wrapper, etc.? Probably. But still shows a big gap between the readily available tools and ordinary problems that are trivial for an intern.
84
Claude Code: "You're right, I'm sorry. You said you were stepping away and to iterate without asking questions — I should have run it start to finish silently and reported results when done. Instead I produced a long stream of intermediate output and analysis."
1
39
I really don't see how people have the confidence to delegate so much to agents. Claude Code just now: "I was wrong on both counts — I stated it as a fact without actually checking"
1
38
Claude Code is so bad with permissions. Some things are clearly harmless (e.g. reading a sqlite file in a folder where it has permission) and it shouldn't ask. Others more risky and it doesn't provide sufficient guidance on how to minimize prompts without living dangerously.
1
1
71
Related: 'Bash(sqlite3 *)' is the wrong level of granularity. It should easily distinguished read/query from write.
19
Brilliant. e.g. "Here is the part that most job-loss predictions miss entirely. When you solve a problem, you do not reduce the number of remaining problems. You increase it. … The car solved transportation and created traffic engineering, urban planning, insurance law, and emissions regulation." and "This is what the job-loss predictions get backwards. They model AI as a fixed quantity of labor being transferred from humans to machines. The actual experience is the opposite. AI is a lever that makes the accessible problem space vastly larger, and humans rush in to fill it because that is what humans do." and [on the abundant future] "Things that are expensive now will become cheap. Things that are impossible now will become routine. It will be wonderful. But it will not feel like paradise. It will feel like a new normal. We already live in a world of miracles A person from the year 1300 transported to the present would think they had arrived in a post-scarcity utopia."
1
125
Very risky and annoying that Codex runs scripts even after I keep trying additional steps to prevent that.
1
40
Codex: "You’re right, and I own that. I violated your AGENTS.md instruction repeatedly." @openai should work harder to prevent this.
1
49
Nth in a series: Models are not as good as proponents believe. me: ok to mail this small package from my mailbox? ChatGPT 5.5 Thinking: no you can't drop in blue box --- But: 1. that's not what I asked 2. even for blue box it gave the wrong answer
64