sarah guo

sarah guo

101 Photos and videos

Tweets

Steven Normore retweeted

sarah guo

@saranormous

Jun 10

x.com/i/article/206450988970…

197

2,169

1,414,297

Simon Eskildsen

Steven Normore retweeted

Simon Eskildsen

@Sirupsen

May 21

turbopuffer crossed $100M run-rate in March. 19mo after $1M. Profitable & <$1M raised. Cursor・Anthropic・Notion・Cognition・Harvey・Bridgewater・Ramp・Linear・Legora・Superhuman・Atlassian・Granola We’d be nowhere without them. We work like hell to exceed their expectations.

252

150

3,442

1,383,957

Jason Normore

Steven Normore retweeted

Jason Normore

@jnormore

May 13

Attention isn't all you need - some attention is. I've been exploring ways to reduce the compute and memory footprint of LLMs. I put it together in a repo: Calibrated Sparse Attention (CSA) is a one-time calibration pass on an existing model that figures out which keys each head in each layer actually uses, and you skip the rest. At matched LM quality I'm seeing potential for: ~10× longer context at the same memory bandwidth budget ~10× less KV-read bandwidth per token (with a sparse kernel) ~10× fewer attention FLOPs ~10× more requests per GB of KV memory (composed with eviction) No retraining or labels needed.

242

Jason Normore

Steven Normore retweeted

Jason Normore

@jnormore

May 13

Been on a quest to make models run with a smaller memory footprint, especially locally on my laptop, so I built a hybrid SSM attention model with three memory layers: • minimal attention KV for turns • SSM state, MB-scale memory footprint for episodic • LoRA from captures for long-term learning Long context without the memory-bandwidth tax of GB KV caches, no prompt inflation. I'm curious to see how it scales to larger models. github.com/jnormore/cognit

GitHub - jnormore/cognit: A local LLM that doesn't pay for context the way frontier models do....

A local LLM that doesn't pay for context the way frontier models do. Hybrid Mamba attention base: fixed-size session state, MB-scale persistence, minimal prompts. MacBook-friendly, OpenAI-c...

github.com

176

tobi lutke

Steven Normore retweeted

tobi lutke

@tobi

May 9

x.com/i/article/205273853311…

173

520

4,477

2,731,239

Steven Normore

Steven Normore

@snormore

Apr 24

Let agents write code and deploy it in sandboxed runtimes, with cron, webhooks, or MCP as primitives: github.com/jnormore/cue

GitHub - jnormore/cue: The runtime where agent-authored apps live. Push over MCP; call by cron,...

The runtime where agent-authored apps live. Push over MCP; call by cron, webhook, or another agent — each call runs in a fresh sandboxed VM. - jnormore/cue

github.com

Jason Normore

@jnormore

Apr 24

"Make each program do one thing well" assumed humans were writing them. Slow, deliberate, by hand. Agents flip this. They can author small, single-purpose programs constantly, for one task, for one user, torn down when no longer useful. The philosophy scales in a way its authors couldn't have imagined. But durable work needs somewhere to live. Not a conversation context that evaporates. Not a workflow SaaS built for humans clicking through a GUI. cue is that runtime. Agents push code into it over MCP. Actions they author become callable — by a schedule, a webhook, an app the agent spun up, another agent. Each call runs in a fresh sandboxed VM, so scale doesn't mean blast radius. Here's how an agent OS starts.

251

DoubleZero

Steven Normore retweeted

DoubleZero

@doublezero

Apr 20

On @Solana, blocks are broken into smaller packets called shreds. They're the earliest public representation of what's happening onchain. How those shreds travel determines who sees what, and when.

0:25

955,180

Jason Normore

Steven Normore retweeted

Jason Normore

@jnormore

Apr 17

My favourite part is the simulator modelux.ai/experimentation/

173

Steven Normore

Steven Normore

@snormore

Apr 17

The control plane for your LLMs...

Jason Normore

@jnormore

Apr 17

most teams have no view into what LLM calls they're actually making, or how cost breaks down across their business. they can't even try a different provider or model version, locked in by an implementation decision they can barely remember making, and the unknown of how each model behaves is too much. most companies are juggling thousands of API keys and Claude Max subs so employees can use AI. no way to route by use case, no idea which teams are using which models. i've been working on modelux to fix this. the control plane for your LLMs. modelux.ai

212

Steven Normore

Steven Normore

@snormore

Apr 16

Shreds over multicast on DoubleZero. This was a fun project, and just getting started.

Malbec Labs

@MalbecLabs_xyz

Apr 16

x.com/i/article/204478891788…

2,066

Malbec Labs

Steven Normore retweeted

Malbec Labs

@MalbecLabs_xyz

Apr 16

Multicast allows the network to handle replication, rather than pushing that responsibility into application-layer systems. Data is transmitted once and delivered across a shared distribution path, reducing divergence in arrival time and improving consistency across receivers.

DoubleZero

@doublezero

Apr 16

Replying to @doublezero

5/ The @Solana ecosystem is rapidly adopting DoubleZero’s high-performance infrastructure: Today there are: → 375 validators publishing shreds to DoubleZero Edge, representing approximately 45% of total Solana stake → 461 validators connected to the DoubleZero network, representing approximately 50% of total Solana stake → 95 devices across 30 metros, operated by 14 independent contributors → 50 DoubleZero Edge subscribers at launch DoubleZero Edge already carries shreds from validators running every major @Solana client. @jito_sol, @temporal_xyz, @StakingFac and @triton_one have also joined as launch partners and ecosystem participants contributing to DoubleZero Edge. View Edge’s data distribution model (multicast) in real-time: doublezero.xyz/multicast-das…

300

DoubleZero

Steven Normore retweeted

DoubleZero

@doublezero

Apr 16

Market data over the internet works. Solana shreds over DoubleZero Edge wins. DoubleZero Edge beta is live. ↓

0:55

331

318,706

Jason Normore

Steven Normore retweeted

Jason Normore

@jnormore

Apr 8

You don’t always need a bigger LLM, just more diverse ones. So I built an ensemble inference proxy that sends prompts to multiple small models in parallel and combines their responses. Initial results look great! gpt-4.1-mini haiku qwen 3b (local): 74% accuracy. GPT-5 alone: 73%. Claude Sonnet: 74%. This ensemble config is 13x cheaper and 2.5x faster than GPT-5. And I haven’t even tested other providers yet. The trick: cross-provider diversity. Same-family ensembles do nothing. But models from different providers make different mistakes, and that's exploitable. Tested 27 configurations across 6 aggregation strategies. The best ensemble beats GPT-5 on knowledge tasks by 8 percentage points. Easy to experiment with your own configurations, just a YAML and emerge sweep. github.com/jnormore/emerge

GitHub - jnormore/emerge: Ensemble inference proxy for LLMs. Combine multiple small models to match...

Ensemble inference proxy for LLMs. Combine multiple small models to match frontier-model accuracy at a fraction of the cost and latency - jnormore/emerge

github.com

189

Jason Normore

Steven Normore retweeted

Jason Normore

@jnormore

Apr 6

Autonomous code/agent optimization: LLM proposes optimization ideas. Genetic algorithms evolve the best combinations. That's cEvolve. Benchmarks show ~60% faster convergence (before parallelization) and more likely to hit top-tier results. Inspired by @karpathy’s autoresearch. github.com/jnormore/cevolve

GitHub - jnormore/cevolve: Genetic algorithms for autonomous code optimization. The LLM imagines...

Genetic algorithms for autonomous code optimization. The LLM imagines ideas, evolution discovers which combinations work best together. - jnormore/cevolve

github.com

283

Jason Normore

Steven Normore retweeted

Jason Normore

@jnormore

Mar 30

If your app can’t be used by agents better than a human can use it, it won’t survive. The next generation of software isn’t designed for humans. It’s designed for agents acting on behalf of humans.

404

DoubleZero

Steven Normore retweeted

DoubleZero

@doublezero

Mar 11

Introducing DoubleZero Edge. DoubleZero Edge is a new platform delivering real-time market data to traders and market participants, powered by multicast. Its first feed: raw Solana shreds directly from leaders, delivered over a high-performance fiber network. Infrastructure that levels the playing field for validators and matches traditional trading systems is here. Dive in ↓

1:42

115

575

516,346

tobi lutke

Steven Normore retweeted

tobi lutke

@tobi

Mar 9

the singularity has begun. so many signs.

Andrej Karpathy

@karpathy

Mar 8

Replying to @tobi

Who knew early singularity could be this fun? :) I just confirmed that the improvements autoresearch found over the last 2 days of (~650) experiments on depth 12 model transfer well to depth 24 so nanochat is about to get a new leaderboard entry for “time to GPT-2” too. Works 🤷‍♂️

141

2,555

445,480

Andrew Curran

Steven Normore retweeted

Andrew Curran

@AndrewCurran_

Mar 5

Striking image from the new Anthropic labor market impact report.

551

2,222

13,355

7,270,297

Darren Shepherd

Steven Normore retweeted

Darren Shepherd

@ibuildthecloud

Feb 27

The correct answer to tabs vs spaces is fmt.

9,944

eden

Steven Normore retweeted

eden

@eden_

Feb 4

oh i get it now @doublezero

2,805