Building AI powered tools to augment human creativity and problem solving. Previously @GitHub and @Google šŸ‡ØšŸ‡¦

Joined April 2007
529 Photos and videos
Pinned Tweet
I've been thinking about why verifying AI agent output feels so much harder than writing the spec that produced it. That question led me to rethink where my attention actually belongs in the process, and eventually to build atelier.dev narphorium.com/blog/decision…
1
6
1,311
Every time an agent explores your code base it builds up its own mental model of how the code works, and then throws that model away when the session ends. Code walkthroughs turn that model it into a step-by-step guide to bring you up to speed on any part of the system.
1
1
9
794
Rather than reading diffs line-by-line, you step through the agent's account of what it did and diff it against your own mental model. This is exactly the sort of high-level decision point which surfaces misunderstandings and gaps in the design. narphorium.com/blog/decision…
1
2
2
231
Atelier walkthroughs can use the Chrome MCP in Claude Code to build UI walkthroughs with live screenshots of the app. Grounding the walkthrough in execution exposes gaps in the implementation that you wouldn’t catch with code review.
2
69
Shawn Simister retweeted
GenAI is good at producing a result. But early-stage design is not about a single result —it is about exploring possibilities. Our #DIS2026 paper ā€œIdeaBlocksā€ won an Honorable Mention šŸ† We ask: how can designers express not only what to generate, but how to explore? (1/n)
1
10
23
1,373
In 1985 Peter Naur argued that a program is more than just its source code. "Programming As Theory Building" explained how we build theories of the code which help us debug and refactor it but those theories rely on knowledge from outside of the code. pages.cs.wisc.edu/~remzi/Nau…
1
10
862
I feel like over the years, we've sort of given up documenting our mental models of code and just accepted that everyone who reads the code builds their own model from scratch.
1
3
232
Now, with vibe coding we don't even do that. The agent is often the only one with a mental model of the code, and it throws it away after each session.
1
3
167
Shawn Simister retweeted
The future role of the software engineer is using AI to translate informal requirements into high level formal specs, and reviewing those.Ā The AI implements the specs, and verifies against the formal spec using a theorem prover. The human is there so we can blame them when things go wrong; the human's job is to ensure the formal spec is correct; that is the code they review. If it seems wrong, they tell the AI and discuss. The human writes nothing but natural language.
5
5
25
2,784
Shawn Simister retweeted
Big paper on AI coding agents using Github & other data The auto-complete tools (Copilot) led to 2.2x more code, local agents like original Claude Code led to 7.4x, & current remote coding agents 17.3x(!) But human bottlenecks in coding means actual releases "only" went up 30%
62
45
343
34,816
Shawn Simister retweeted
your novel idea, when you ask an llm to fill in the details
We need a name for this, because Armin is putting his finger on a problem that’s everywhere: people running their writing through an LLM because they think it makes it clearer, when in actuality it sands off all the detail.
15
63
948
97,501
Shawn Simister retweeted
We desperately need better ways of evaluating models. Something that shows how helpful they are at working hand-in-hand with humans to help them get stuff done in a cooperative/iterative way. The Claude models have consistently been better at this, and the market rewards that.
10
9
193
15,815
Shawn Simister retweeted
What are users thinking during their interactions with LLMs? We introduce ThoughtTrace — the first large-scale dataset that captures what users think during real-world human–AI conversations, not just what they type. → 10,174 thought annotations → 2,155 multi-turn conversations, 17,058 turns → 1,058 users → 20 LLMs These thoughts improve user behavior prediction ( 41.7%) and model alignment ( 25.6%). This opens a new paradigm of user-centric LLM research. Full information in the thread 🧶 Read our paper: arxiv.org/abs/2605.20087 Check our project website: thoughttrace-project.github.…
10
34
135
68,847
Shawn Simister retweeted
I always found it hard to document large codebases in a way that made sense to me visually Thanks to @tldraw I built CodeCanvas, my own infinite canvas documentation tool for mapping out my thought process Excited to share some of my favorite features
I love how customizable @tldraw is. Added a custom markdown editor using @tiptap_editor and i can't get enough of it. I'm having so much fun building whatever this is lol
3
1
12
604
Shawn Simister retweeted
as ai makes imitation cheaper and cheaper the value of using AI and your brain to make totally new things goes up
22
4
45
6,368
I sketched this out a few years ago. The HTML vs Markdown debate is conflating substrate with information density. The real question is what kind of feedback an artifact actually invites. Hi-fi invites parameter critique. Lo-fi invites paradigm critique.
1
2
14
541
So now AI has made the high-fidelity artifacts cheaper and easier to create but that doesn't change the rest of the equation. If anything, it makes it easier to fall into the trap of confusing high fidelity with high confidence.
3
131
Prototyping and experimentation is not slop. Slop is when you don't care how it works. The whole point of prototyping is that you care deeply about finding what works narphorium.com/blog/top-down…
AI slop is good, actually. Slop is what enables fast parallel experimentation. The etiquette and skill is understanding the boundaries of where slop exists and the extent to which it should be cleaned up and how. A few examples: I’m working on the internals of some system right now. The API and GUI of this thing is fully zero shame slop. It’s horrible. But it lets me focus on the core quality while shipping a usable piece of alpha quality software to testers (transparent about the slop frontend). Similarly, this system has plugins. We sent agents in Ralph loops overnight to generate dozens of plugins. The plugins are slop. The quality is bad. The plugin API/SDK is absolutely not done. But we can test a full GUI with a full plugin ecosystem. When we change the API, we can regenerate them all. The cost of change is just tokens, the velocity is incomparable to before. I built Terraform. We tested and shipped TF 0.1 with about 3 very weak providers. Because we ran out of time. Building was slow. And when we changed our SDK the cost was immense. Totally different today, 10 years later. Today, I would’ve slop generated 100 providers (again, with transparency and cleanup later, but just to prove it out). As an anti example, I would not PR this (without prior warning) to another project. I would not throw this onto customers without full review or transparency (as I’m already doing). I would not accept first pass slop. It’s almost never right. Slop is a tool. And like anything else it’s not blanket bad or good. The context is everything.
1
2
258
Shawn Simister retweeted
Sharing a preview of an experimental tool I've been working on: A canvas-based IPython-compatible computational notebook exploring how "human-in-the-loop" looks like in an age of autonomous AI agents. More updates coming soon!
4
2
12
1,429
One of the most famous power-user tools in the world is switching to an AI chat interface 😬 "This will be the new Terminal. This will be the primary way most interactions are happening..." wired.com/story/the-bloomber…
1
58