Justin H. Johnson

Justin H. Johnson

97 Photos and videos

Tweets

Justin H. Johnson

@BioInfo

SWE-Bench Pro is the field's flagship coding benchmark and it turns out the verifiers are wrong 32% of the time, with a clean git-history gaming vector on top. a verifier that wrong isn't ranking who codes best, it's ranking who games it best. this is why i keep saying the eval is the product, not the model.

Justin H. Johnson

Justin H. Johnson

@BioInfo

forget the benchmark for a sec. a municipal IT department shipped frontier open weights. not a lab, not a startup, a city government. the model was never the moat, and this is what that looks like.

Chubby♨️

@kimmonismus

Jun 13

Wait what? Rio 3.5 Open 397B, developed by IT company of Rio de Janeiro's city government is now SOTA open source and even outperforming Qwen 3.7? What is happening today. Never heard of them before.

Justin H. Johnson

Justin H. Johnson

@BioInfo

i run a version of this on my own gateway, per-consumer keys routing each call to the cheapest model that clears the task. the router itself becomes a thing you have to eval too though, a bad route silently bills double or drops quality.

OpenRouter

@OpenRouter

Jun 13

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

Justin H. Johnson

Justin H. Johnson

@BioInfo

13h

I rewrote my most-read Claude Code piece for 2026. When I first wrote it, Skills, native subagents, and hooks didn't exist. Now they're most of the point. The mistake people still make: treating Claude Code like a faster autocomplete and fussing over the prompt. The real payoff is the harness you build once and reuse on every task: - a tight CLAUDE.md (bigger is NOT better, it spends attention every turn) - a real permissions policy, not --dangerously-skip-permissions everywhere - skills for every workflow you'd otherwise re-explain - hooks for the non-negotiables the model can't be trusted to remember - subagents to fan out, inline reasoning to stay deep The prompt is the cheap part. The configuration is the moat, and it compounds. Full rewrite, plus the repos I keep open source (slopless, claudelicious): glyf.cc/claude-code-2026

claude-code-best-practices - AIXplore - Tech Articles

Claude Code Best Practices: Setup, Commands, and the Defaults Worth Changing

ai.rundatarun.io

Justin H. Johnson

Justin H. Johnson

@BioInfo

18h

A researcher asks a question that sounds simple: "Which AI model should I use for my DNA variant, BOLT-LMM or DNABERT?" It's a trap, and it's the most common mistake I watched teams make over years in genomics. Those two tools don't belong in the same conversation. One scans a whole population to find which genetic differences track with a trait. The other reads a single stretch of DNA and turns it into something a computer can work with. Asking which is "better" for one variant is like asking whether a microscope or a telephone is better for measuring temperature. Here's the part that should bother anyone who relies on benchmarks. "Genomic AI model" is one phrase covering at least nine genuinely different kinds of tool, each answering a different question. And the leaderboards everyone uses to choose a model can't catch the mistake, because a leaderboard ranks tools within a category. It assumes you already picked the right category. The error happens one step before the benchmark ever runs. Pick the second-best tool in the right category and you lose a little accuracy. Pick the wrong category entirely and you get a confident answer that means nothing, formatted and plotted and ready to drop into a slide. The fix is a habit, not a tool: before you ask which model is best, ask what kind of object your task actually needs. Are we even in the right column? I built a small open thing to make that concrete. But the lesson outlives the tool, and it's not only about genomics. The next time someone hands you a "which is best" question, the more useful question is usually hiding underneath it.

Justin H. Johnson

Justin H. Johnson

@BioInfo

18h

Full Sunday Deep Dive: glyf.cc/rdr-2026-06-wrong-to…

The Wrong Tool Problem in Genomic AI

A leaderboard tells you which model wins. It can't tell you that you picked the wrong kind of model.

rundatarun.io

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 13

today's the best ad cohere could run. a frontier model just got pulled offline by government order and everyone renting it is dead in the water. i run my own models on a box under my desk for exactly this reason. the ownership argument stopped being theoretical this morning.

Cohere

@cohere

Jun 13

When you rent your artificial intelligence, you have no control, and no choice. This is why sovereignty and ownership matters. Whether it means using your own hardware, open source, or deep customization. Own your AI, own your future.

201

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 13

The US government just forced Anthropic to pull Claude Fable 5 and Mythos 5 offline. Globally. Export-control directive, days after launch, over a jailbreak. First government takedown of a frontier model. AI governance just stopped being a whitepaper. glyf.cc/fable-mythos

The AI Wire

The day's AI signal, down the wire. A daily synthesized dispatch on models, papers, and tools, plus the week's video digest.

wire.rundatarun.io

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 12

Claudelicious: a public cookbook for the Claude Code harness I run. Not a list of skills, the why and the wiring behind the whole system: rules, hooks, memory, a learning loop, always-on agents. The model is the commodity. The harness is the moat. github.com/BioInfo/claudelic…

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 12

Busy week away from the usual headliners. Google shipped DiffusionGemma (text generation by diffusion, ~4x faster) and Gemma 4, Microsoft revealed its MAI frontier models, and Bezos raised $12B for an 'artificial general engineer.' All in today's wire.

0:07

108

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 12

glyf.cc/mHOEE

The AI Wire

The day's AI signal, down the wire. A daily synthesized dispatch on models, papers, and tools, plus the week's video digest.

wire.rundatarun.io

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 11

the refusals are friction, i hit a few too. but the switch is less about model quality than whose harness your loop already lives in. codex won because the loop was already built around it. i spent a day with fable inside my own harness and it just disappeared into the work.

Dylan Patel

@dylan522p

Jun 10

Usage share of OpenAI grew vs Anthropic yesterday despite Mythos 5 / Fable 5 launch Multiple power users at SemiAnalysis tried Mythos / Fable Got refusals for nonsensical reasons Got pissed off at Anthropic Gave Codex a legitimate try Now they actually prefer it to 4.8 Opus

117

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 11

wrote up the full day here: glyf.cc/rdr-fable5-day

What one day with Fable 5 looks like for a builder

Most private tools that work never become public ones. Here's the afternoon that gap collapsed.

rundatarun.io

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 11

I built a finance app for myself in two nights. It works, it reads my real accounts, and it would help other people who want the same thing. It was never going to leave my laptop. Not because the idea is bad. Because making a private tool public means a weekend of dull, dangerous cleanup: hunting for secrets buried in old files, scrubbing your personal data out of the history, renaming everything, building a brand, writing a front page. One missed line and you've published your bank details to the internet forever. So you don't. The tool stays yours. Most useful private things die exactly this way. For scale: months ago I built its sibling, EmberPlan (emberplan.com), a FIRE-planning app I run for the financial-independence community. On the models of that moment it took maybe ten times the effort. This one took two nights. Yesterday Anthropic shipped Fable 5, a new tier of model. Today I handed it the whole app and asked for a clean public version. In one session it swept the history and found my real brokerage positions hiding in a forgotten test file. I would have shipped those by eye. It caught them, cleaned the project, redesigned it, named it, generated the brand, and open-sourced it. The leak it caught is the whole point. The work that kept this tool private wasn't hard, it was tedious and risky at the same time, which is the exact combination that makes "I'll do it later" the rational call. Take that away and a different question lands on the table: that useful thing you built for yourself and never shared, is it about to start helping people you'll never meet? For the first time the work in the way is small enough to actually do.

122

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 11

Full piece: glyf.cc/rdr-2026-06-fable-5

What one day with Fable 5 looks like for a builder

Most private tools that work never become public ones. Here's the afternoon that gap collapsed.

rundatarun.io

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 10

Apple just rented its brain. At WWDC this week, the company that owns every layer of its stack, the chip, the OS, the store, the cable, did the one thing you'd assume it never would. It built the new Siri on Google's Gemini, distilled it into its own models, and put it on a billion phones. The same week, the largest prediction market on which company has the best AI model gave Google an 8% chance. Anthropic sat at 90. Apple didn't rent the best model. It rented one that was good enough, available, and willing to sign. That's the strategy, not a slip. The model is becoming a commodity. A new frontier one lands every few months, and they get more interchangeable by the week. What you can't rent is everything around it: a privacy boundary outsiders can verify, the logic deciding what runs on the phone versus the cloud, and a billion devices people already carry. Apple rented the engine and built the car. For most people, this iPhone is where they'll meet real AI for the first time. Not a separate app they had to go find. The assistant in their hand, doing the thing they wanted done. The brain underneath will be a commodity they never see. The question I can't answer yet: is Apple's wrapper a real moat, or a beautiful interface over a rented model? On day one, the reporting can't even agree how much of this is Google. We find out the first time Google ships something Apple can't distill its way around. Rent the brain. Own the car. Just make sure it's one only you could build.

288

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 10

Full piece: glyf.cc/apple-brain

Apple Rented Its Brain

The company that owns its whole stack just outsourced the one part you'd assume it never would. That choice is the most interesting thing at WWDC.

rundatarun.io

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 9

small, open, agentic is the slot i've wanted filled for local. on my homelab the frontier coding model was never the constraint, fitting one that runs on my own hardware into the harness is. throwing this at my stack.

Cohere

@cohere

Jun 9

Introducing Cohere's first open-source coding model: North Mini Code Small & efficient, designed for agentic performance and built for community input.

6:10

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 7

AI Is Building AI. Now Read the Footnotes.

0:24

Justin H. Johnson

Justin H. Johnson

@BioInfo

Jun 7

glyf.cc/5epmb

AI Is Building AI. Now Read the Footnotes.

Anthropic published striking evidence that AI is automating its own development. The shift is happening. The way it was sold deserves a closer look.

rundatarun.io