CSS, SQL, and everything in between.

Joined April 2007
829 Photos and videos
Started last weekend, wrapped up this weekend. Before and after.
4
195
You think programming is hard? Try creating an ideal FX chain with both parallel and serial loops using outboard gear with synthesizers. (It's coming along, but the routing and gain staging is breaking my brain)
48
David Mosher πŸ‡¨πŸ‡¦ retweeted
Going to get a bit nerdy here, but I had a token and cost optimization idea I wanted to test for design engineering and frontend teams. Warning: this is a long post. But for teams running a lot of AI-assisted frontend work, the difference could add up quickly, potentially to hundreds of thousands of dollars depending on model, workflow, and volume. TLDR: Tailwind used substantially fewer AI coding tokens than CSS Modules because agents could usually edit styles directly in component class strings instead of reading and modifying separate stylesheet files. The hypothesis I started looking at whether we can improve token efficiency without losing accuracy when building UI. Specifically, when an AI coding model makes styling changes, does Tailwind use fewer tokens than CSS Modules (What we currently use)? I've used Tailwind for many years and find it to be faster and easier to maintain. Why wouldn't this apply to AI agents too? The hypothesis wasn't that Tailwind is universally better, but that keeping styles in component markup may reduce file reads, cross-file edits, and context requirements for style tasks. CSS Modules often require the model to work across both the component and a separate stylesheet, which could increase token usage. This was not intended as a universal cost benchmark. Results should be interpreted as differences in token workload, with actual costs depending on each provider’s pricing model. The setup I created a test app with two branches of the same React/Vite application. The only differences were the styling approach and corresponding AGENTS.md guidance. The app was intentionally minimal, with identical visuals across both branches, and organized into a set of design system components used on a single route to give the model realistic component-level files to work with. The Tailwind branch used utility classes directly in React components plus a single CSS file containing only the Tailwind import. Its AGENTS.md: β€œThis is a React Vite app using TailwindCSS for styles. Always utilize Tailwind’s utility classes without adding additional customizations.” The CSS Modules branch used component-specific CSS module files colocated with each component, a common setup where styles are maintained separately from markup. Its AGENTS.md: β€œThis is a React Vite app using CSS Modules for styles. Component styles are imported from a module file colocated in the component's directory.” The harness I used Pi as the coding harness. Pi is a popular coding harness with read and edit tools, session continuity, and model-reported token usage. Each styling branch was ran 3 times with the same rules: - One fresh Pi session per branch. - The same ten prompts per branch. - The same prompt order per branch. - The same model per comparison. - The same system prompt per branch. - The same allowed tools: read, edit, write. - No builds, tests, formatters, linters, screenshots, or visual checks during the measured run. - No retries and no corrective follow-up prompts. Within a branch, the same session continued across all ten prompts. Between branches, the repository and context was reset so generated changes from one styling system could not leak into the other. The ten prompts were style-specific changes and were designed to be simple enough that both systems could make the change, but varied enough to require repeated styling edits. The finalized runs produced changes for all ten prompts in both branches. Here's the 10 prompts that were used: 1. Make the buttons slightly more compact while keeping the primary button visually stronger than the others. 2. Increase the space between buttons and keep the row centered on the page. 3. Make the buttons feel softer by increasing the corner radius without changing their labels. 4. Make the primary button darker and make its hover state clearly visible. 5. Make the secondary buttons feel more subtle while preserving their readable text. 6. Make the keyboard focus state more noticeable and consistent on every button. 7. Improve the small-screen layout so the buttons stack cleanly on narrow viewports. 8. Make all buttons a little wider while keeping their text style unchanged. 9. Slightly warm the page background while keeping the buttons visually neutral. 10. Add a very subtle pressed state to the buttons without adding any visible text. Token usage was collected from Pi's JSON event stream. For each assistant message, Pi recorded: - input tokens - output tokens - cache read tokens - cache write tokens - total tokens - assistant message count - tool calls - tool paths - changed files For analysis, total token usage is the primary metric. That's the value reported by the model provider through Pi for each assistant message, added up across all ten prompts. It's especially important for Anthropic models, where much of the reported usage appeared as cache read and cache write tokens. The Claude Opus 4.8 run reported very small direct input token counts but large cache categories. Those cache categories are included in total tokens, because they are part of the model-reported usage for the session. The models The experiment was ran with GPT 5.5 and Opus 4.8 with medium reasoning. GPT 5.5 result averages: Tailwind Total tokens: 46,163 Input tokens: 18,623 Output tokens: 1,428 Cache read tokens: 26,112 Assistant turns: 22 Tool reads: 3 Tool edits: 10 Changed prompts: 10 of 10 Elapsed time: 129.7 seconds CSS Modules Total tokens: 102,310 Input tokens: 27,875 Output tokens: 1,731 Cache read tokens: 72,704 Assistant turns: 30 Tool reads: 15 Tool edits: 10 Changed prompts: 10 of 10 Elapsed time: 165.5 seconds With 5.5, Tailwind used 56,147 (54.9%) fewer tokens and was 21.6% faster to completion. The edit count was the same, but CSS Modules caused substantially more reading and context gathering. Opus 4.8 result averages: Tailwind Total tokens: 90,447 Input tokens: 46 Output tokens: 2,828 Cache read tokens: 81,914 Cache write tokens: 5,659 Assistant turns: 23 Tool reads: 3 Tool edits: 10 Changed prompts: 10 of 10 Elapsed time: 120.9 seconds CSS Modules Total tokens: 147,765 Input tokens: 50 Output tokens: 4,279 Cache read tokens: 134,908 Cache write tokens: 8,528 Assistant turns: 25 Tool reads: 6 Tool edits: 12 Changed prompts: 10 of 10 Elapsed time: 127.0 seconds With 4.8, Tailwind used 57,318 (38.8%) fewer tokens and was just 4.8% faster to completion on this one. Tool calling was much less all over the place for this one, but CSS Modules still caused more file interactions than Tailwind. So why was CSS Modules more expensive in this setup? A model editing CSS Modules has to manage at least two surfaces: The React component that applies the class, and the module stylesheet that defines the class. Even when the class names already exist, the model may inspect both sides to understand what is safe to change. When class names need to be added or adjusted, the model may need to edit both the component and the CSS module as well as the css properties needed for the desired styling changes. That increases file traversal, assistant turns, and cached context. Tailwind coordinates many of those style decisions with the model's Tailwind training data into the component file with style properties handled in the Tailwind compiler rather than the model writing them. For these prompts, that meant the model simply added or edited existing className strings rather than navigating between markup and stylesheet definitions. This experiment does not prove that Tailwind is always cheaper for AI coding. It does not prove that Tailwind produces better code. It does not prove that Tailwind improves human maintainability. It does not prove that CSS Modules are inefficient in larger or better-structured applications. It does not measure long-term maintenance cost. It does not measure complex refactors, accessibility work, state changes, animation systems, responsive redesigns, or large design system integration. It does not isolate provider billing cost perfectly, because providers may price cache reads, cache writes, input tokens, and output tokens differently. It also does not remove all possible harness effects. Pi provides an amazing coding environment, which is a strength, but the result is still partly a measurement of model behavior inside Pi. If I had used codex for the 5.5 runs and claude code for the 4.8 runs I wouldn't have been able to control the harness consistency. The conclusion It's not that Tailwind is magically token-efficient, it's that for style changes, Tailwind often lets the model stay in the component file and edit existing utility classes while letting the Tailwind compiler write the styles for it outside of the token context. CSS Modules more often require separate stylesheet inspection and edits. In a coding harness, that extra file interaction becomes extra context, extra assistant turns, and extra tokens. The fair takeaway is if a team expects AI agents to perform UI styling edits, Tailwind may reduce token usage a pretty meaningful amount compared to a CSS Modules setup. However, this should still be retested on larger, real application surfaces before being treated as a general rule. If you're a team looking for more token optimization in your workflows and aren't already using Tailwind, maybe this optimization would provide this test's level of impact.
8
5
74
11,627
David Mosher πŸ‡¨πŸ‡¦ retweeted
I have a nice routine to share: We commonly communicate technical architecture/proposals in docs. Each night, I load new ones in either a notebookLM or prepare to play the doc in audio. In the morning I go for a walk & listen. Walking and thinking is cathartic. It’s nice to be in nature.
11
6
209
18,308
Use Pi GPT, think hard before prompting, single-thread your work and focus on simplifying.
So, at your company, what are you doing about the spiking token prices? Approaches I hear: 1. Inference routing - route more basic workloads to a LOT cheaper APIs or providers 2. Use cheaper open models for inference (eg Fireworks, Baseten etc) 3. Default model: cheap one
119
Going on 2 months now. Pi and a Codex subscription is fantastic.
I just cancelled my Claude account. I've been using codex, and haven't used Claude in several weeks.
2
125
Perhaps the commodification of UI design will be the thing that leads to it being made obsolete. If everything that is designed begins to look the same with agents trending towards the mean, then it stands to reason that one way to stand out is to have the opposite: no UI.
1
93
Another way to stand out is to go in the opposite direction: Maximum UI, brutalist designs, high-friction surfaces, bespoke UI for everything. Interfaces that feel like architecture instead of a series of templates.
47
David Mosher πŸ‡¨πŸ‡¦ retweeted
The more I replace plans with prototypes, the better the outputs Who'd have thought that low fidelity prototypes were better than walls of spec Oh yeah, the entire industry for 20 years Stop going against decades of knowledge because someone in SF shipped it as a 'mode'
May 7
i never make plans i hate looking at markdown i don't wanna read markdown files i just plan by having it make changes to the code then i look at the code to see what sucks then i prompt again
124
114
2,059
333,486
David Mosher πŸ‡¨πŸ‡¦ retweeted
AI slop is good, actually. Slop is what enables fast parallel experimentation. The etiquette and skill is understanding the boundaries of where slop exists and the extent to which it should be cleaned up and how. A few examples: I’m working on the internals of some system right now. The API and GUI of this thing is fully zero shame slop. It’s horrible. But it lets me focus on the core quality while shipping a usable piece of alpha quality software to testers (transparent about the slop frontend). Similarly, this system has plugins. We sent agents in Ralph loops overnight to generate dozens of plugins. The plugins are slop. The quality is bad. The plugin API/SDK is absolutely not done. But we can test a full GUI with a full plugin ecosystem. When we change the API, we can regenerate them all. The cost of change is just tokens, the velocity is incomparable to before. I built Terraform. We tested and shipped TF 0.1 with about 3 very weak providers. Because we ran out of time. Building was slow. And when we changed our SDK the cost was immense. Totally different today, 10 years later. Today, I would’ve slop generated 100 providers (again, with transparency and cleanup later, but just to prove it out). As an anti example, I would not PR this (without prior warning) to another project. I would not throw this onto customers without full review or transparency (as I’m already doing). I would not accept first pass slop. It’s almost never right. Slop is a tool. And like anything else it’s not blanket bad or good. The context is everything.
106
219
2,707
216,111
David Mosher πŸ‡¨πŸ‡¦ retweeted
AI changed the cost structure of software. Outputs may be cheap, but outcomes are still valuable. @dmosher says leverage has moved to the harness: lint rules, types, tests, ADRs, feedback loops. link.testdouble.com/ecfcc5
1
1
130
A few links that made it onto my reading list this weekend πŸ‘‡ πŸ”§ LazyPi β€” opinionated `pi` setup with themes, subagents, memory, ralph loop, and more β†’ lazypi.org
4
2
123
πŸ“œ "The Story of Mel" β€” a 1983 Usenet post resurfaces via @garybernhardt (worth a follow). Somehow more relevant than ever β†’ catb.org/jargon/html/story-o…

84
πŸ”‘ "Some Secret Management Belongs in Your HTTP Proxy" β€” from exe.dev (VMs sandboxes for agents). Useful framing if you're thinking about secrets in agentic stacks β†’ blog.exe.dev/http-proxy-secr…
39
David Mosher πŸ‡¨πŸ‡¦ retweeted
Is there still a widespread belief that LLMs and coding agents are good for greenfield development but don't help for maintaining large existing codebases? I don't think that idea holds up any more
231
33
1,026
153,893
I'm still in the honeymoon phase and haven't yet used it in anger, but so far, @Railway is really good. If they can sort out their platform stability woes as they scale, this feels like the new Heroku in terms of DX. Just discovered the "simple dashboard" APM button and it'sπŸ‘ŒπŸ»
2
2
316
David Mosher πŸ‡¨πŸ‡¦ retweeted
I don't want to go too deep on AI DDD. My current thinking: GOOD: Ubiquitous Language / Bounded Contexts / ADR's BAD: Entities / Value Objects / Aggregates / Domain Events Essentially, use DDD to document the app but don't prescribe the shape of the app
I'm starting to think that DDD might be the answer to all of my problems - Model not doing what you want? Shared language - Can't navigate a massive codebase? Bounded contexts with global mapping - Don't know why a decision was made? ADR's It's just so freaking elegant
52
15
505
71,973