Matt Shumer

Matt Shumer

45 Photos and videos

Tweets

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

Both @JasonKuperberg (joined as well!) and I spent a lot of time thinking about what was worth working on, with AI moving so quickly. Personally, I had a terrible experience with the education system, and the opportunity to use AI to fix it is quite compelling. Hearing @jliemandt's vision for Alpha made it so clear that this is the way forward. We're rebuilding what school is from the ground up, and the early results are hard to argue with. Kids are learning multiples faster, with their afternoons freed up to actually be kids and learn real life skills. Separate thanks to @neerajg for bringing us on. Excited to work with @PSkinnerTech, @C_Hendrick, @profbeckyallen, @PepsMccrea and many others to help transform education. (Also the video isn't perfect at all! Unfortunately, Fable access got cut off while it was still generating.)

4,918

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

Claude Fable made this Pixar-quality educational video in one shot. It's a preview of what I'm doing next: I've joined Alpha School to push the limits of what AI can do for learning. We're going to transform education for 1B kids.

3:45

210

20,290

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

Apr 20

Quick update I’ve been meaning to share for a while: @JasonKuperberg and I stepped away from HyperWrite a few months ago. We’re proud to have built one of the first consumer AI products, and turned it into a profitable business with real scale. But the AI landscape is moving fast, and we both felt the pull to start exploring again. @josh_bickett is now CEO. Josh has been with us since the beginning and led development on almost every major feature we shipped. There's no one better to take this forward. Josh is doubling down on TypeAhead, which is the stickiest product we’ve ever built. It’s personalized, continually-learning autocomplete for every site on the web, powered by frontier LLMs. If you haven’t tried it yet, I recommend you do so… it’s amazing. I just want to say thank you to everyone who supported HyperWrite along the way: our users, our team, our investors, and the community. I’ll have more to share on what I’m working on now in the coming weeks, but for now, go follow Josh and try TypeAhead.

40,956

Jason Kuperberg

Jason Kuperberg

@JasonKuperberg

Apr 10

I’m the roommate. My OpenClaw (actually a Hermes Agent) is peripherally famous!

Matt Shumer

@mattshumer_

Apr 10

Norm (my OpenClaw) is famous! cnn.com/2026/04/10/tech/appl…

38,993

Rork

Jason Kuperberg retweeted

Rork

@rork

Feb 19

Introducing Rork Max AI that one-shots almost any app for iPhone,  Watch, iPad,  TV &  Vision Pro. Even Pokémon Go with AR & 3D. Max is a website that replaces Xcode. Install on device in 1 click. Publish to App Store in 2 clicks. Powered by Swift, Claude Code & Opus 4.6.

1:00

657

1,420

17,936

8,571,127

Jason Kuperberg

Jason Kuperberg

@JasonKuperberg

Feb 11

How it started…

Matt Shumer

@mattshumer_

Feb 10

x.com/i/article/202109512883…

252

10,336

3,613,578

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

Feb 10

x.com/i/article/202109512883…

6,595

28,091

118,800

87,190,324

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

Jan 16

FYI you can use ANY model with the Claude Agent SDK Just change these three environment variables!

640

59,087

Jonathan Levitt

Jason Kuperberg retweeted

Jonathan Levitt

@JWLevitt

23 Dec 2025

Today's episode of @longrunlabs is with @JasonKuperberg, an expert in AI and creator of "Roast My Strava." Here's my 2025, roasted.

1,028

Jonathan Levitt

Jason Kuperberg retweeted

Jonathan Levitt

@JWLevitt

23 Dec 2025

AI gets interesting when it stops being about prompts and starts being about thinking. We talked about using AI to: • clarify ideas before sharing them • turn repetitive work into leverage • build and ship without big teams • create products that feel human, not automated Apple: podcasts.apple.com/us/podcas… Spotify: open.spotify.com/episode/5Nr… Written: open.substack.com/pub/longru… @superfiliate is happy to offer your first month free to anyone who mentions @longrunlabs. Save better and manage/adjust your subscriptions with @RocketMoneyApp: rocketmoney.com/gorun

The Intersection of AI Innovation and Running with Jason Kuperberg

Podcast Episode · Long Run Labs · December 23, 2025 · 1h 13m

podcasts.apple.com

617

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

11 Dec 2025

I've had access to GPT-5.2 since November 25th. Since then, I've used it as my daily-driver, pushing it to its limits. It beats out Opus 4.5 in most things I tried, but there's a (big) catch. Here's my review of GPT-5.2: shumer.dev/gpt52review

GPT-5.2 Review: Incredibly Impressive, But Too Slow

Hands-on GPT-5.2 review: better instruction following and codegen, Pro is a slow genius, but Thinking is too slow to compete with other models of similar intelligence.

shumer.dev

1,041

274,218

Jonathan Levitt

Jason Kuperberg retweeted

Jonathan Levitt

@JWLevitt

3 Nov 2025

Have y'all seen Roast My Strava?? 😂

2,169

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

26 Oct 2025

My co-founder and I have been moving around a lot lately so we finally just got this Pretty crazy to think that this represents less than 2%!! of the total tokens we’ve used across models providers

14,227

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

16 Oct 2025

Introducing: Interactive Sora! A choose-your-own adventure GAME powered by Sora 2. Every choice spins up a brand-new scene instantly. Open source, link here:

0:29

967

133,485

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

9 Oct 2025

I want to invest in more AI infra companies. If you’re building something exciting, comment below. If interesting, I’ll DM you. Pls, infra only. No app layer etc.

6,820

Josh

Jason Kuperberg retweeted

Josh

@josh_bickett

7 Oct 2025

Inspired by OpenAI Dev Day. HyperWrite processes 1B tokens every week :)

483

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

7 Oct 2025

Introducing: Sora Extend! Generate Sora 2 videos of INFINITE length :) Bypasses OpenAI's 12-second limit with intelligent prompt enhancement and additional last-frame context. Open source, link here:

0:24

115

182

2,223

361,850

Shrey Kothari

Jason Kuperberg retweeted

Shrey Kothari

@Shreyko

22 Sep 2025

Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get the highest scores? Why Among AIs? Let’s break it down 👇

134

1,253

219,026

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

27 Aug 2025

If you use Google Colab, I have something insanely cool for you to try. Limited spots available. Comment below and I'll DM you.

109

20,668

Matt Shumer

Jason Kuperberg retweeted

Matt Shumer

@mattshumer_

7 Aug 2025

I've had access to GPT-5 since July 21st. Since then, I've used it as my daily-driver, pushing it to its limits. Here's my review of GPT-5 (note: full, interactive review w/ artifacts is linked in the next tweet): -- TL;DR: - GPT-5 is clearly a big leap from previous models. But you have to push it hard to get the most out of it. - The ceiling for what can be vibe-coded is now much higher than it was with previous models. - Better-than-o3 intelligence, plus super-fast speed... I'm way more productive than I've ever been. - Fantastic long-context handling, incredible precision on coding tasks. - Super detail-oriented: makes far fewer stupid mistakes than other models. - Modes: Auto (default), Thinking (use for complex work), Pro (not evaluated here). - o3 is better for explicit research; GPT‑4.5 is still better for writing; instruction sensitivity is a bit of a problem. - Bottom line: best overall model right now; the bar has been raised. Review: I was granted access to GPT-5 on July 21st. And honestly, when I started testing it, I wasn’t blown away. In fact, I felt quite let down, especially given all of the hype and expectations around it. The model felt like GPT-4.2 at best… faster, definitely sharper than 4.1, but not some huge leap. I tried to use it for my day-to-day work (which, IMO, is the best way to evaluate any new model), and while it handled the tasks I was giving it very well, I wasn’t noticing anything dramatically better than GPT-4.1, Claude 4 Opus, or any of the other models I’ve been using. I caught myself thinking, Is this really it? I settled into a routine of using GPT-5 for pretty much everything I would use existing LLMs for, and this went on for about a week. Was it better than Claude 4 Opus, my previous daily driver? Yes, undoubtedly, but only marginally. It felt like a small, incremental improvement. But then things took an unexpected turn. Josh (my lead engineer at HyperWrite) and I had spent an afternoon discussing a complex new product idea… one we'd estimated would take weeks, maybe months, of dedicated engineering work to even get a proof-of-concept together. The idea was intricate, involving a sophisticated frontend with tightly integrated components and a complex backend infrastructure for managing GPUs, autoscaling resources, and lifecycle management. This wasn’t the kind of thing you just vibe-code; even with the help of AI, it required deliberate human oversight at every step — or so we thought. Josh and I already decided we’d need at least a full month of discovery just to figure out if a build-out was worth attempting. That night, purely out of curiosity, I fed GPT-5 a product spec, fully expecting it to stumble immediately. An hour later, I sent Josh a fully working prototype. His immediate reply: “What the fuck.” Just… Wow. That moment completely flipped how I thought about GPT-5. We literally skipped a month of upfront customer discovery and planning. We could just immediately go test with real users. (By the way, if you’re actively training models, hit me up—I would love to show it to you, and I want to make sure we’re building something you’d actually use.) From there, things got interesting fast. I started probing deeper, trying more ambitious tasks that I’d never even bothered asking previous models. The more I did, the clearer it became that GPT-5 wasn’t incremental. One area GPT-5 completely nailed was frontend code. If you’ve used AI for frontend before, you probably know what I mean when I say it usually feels "made by AI." The designs are typically a bit clumsy, predictable, obviously machine-generated. With GPT-5, though, the UIs felt way closer to convincingly human… 80% indistinguishable at a glance. It could even clone a Figma mockup from a screenshot incredibly quickly... little details were off, but for a first pass, it's far better than anything I've seen before. Occasionally, I’d still need to prompt it once more for responsive tweaks, but those adjustments were trivial, done in seconds. Frontend is close to being a solved problem. It’s strikingly detail‑oriented, often getting micro‑interactions, spacing, and states right on first pass. (Check out the web version of this review to see how well GPT-5 fares at cloning frontends compared to other models.) On backend and infrastructure, GPT-5 was just as good, maybe even more impressive. Take the GPU infrastructure task again: after just three short rounds of prompting, GPT-5 set up automated provisioning, scaling, and teardown of GPUs. This felt like genuine autonomy, with the model building something stable and usable from start to finish. The deeper I went, the more clearly I saw just how different GPT-5 was. On niche machine learning tasks, especially tricky things involving libraries like TRL, GPT-5 consistently impressed me. At one point, it clearly didn’t know the most up-to-date TRL pattern directly from its training data, but instead of getting stuck or hallucinating something random, it autonomously went straight into the documentation, found exactly the right answer, and implemented it correctly. No hand-holding, no doc-pasting needed. I’ve seen other models occasionally do similar things, but GPT-5 does it consistently enough that I’m now relying on it heavily for fine-tuning/RL code, which I’ve never been able to do with past models. I’m also going deeper into the stack than I ever have. I’m not just leaning on it for high-level training scripts; I’m modifying code I wouldn’t have touched before. If the deepest I used to go was “training loop and configs,” I’m now comfortably editing the layer below—custom losses, data pipelines, etc., because the model is reliable. Previously, models would get this stuff wrong quite often, so I couldn’t “let go” and trust them for anything more than the high-level stuff. Not anymore. The effect is simple: wherever your ceiling was before with Claude 4 Opus, o3, etc., GPT-5 lets you go one layer deeper. GPT-5 also became my go-to partner for actual model training runs. It literally coached me through adjusting hyperparameters, debugging weird failures, mitigating reward hacking, etc. From my experience, its suggestions were spot on! A couple weeks back, when I released AutoRL with the @OpenPipeAI team, GPT-5 one-shotted the training loop based on a description of what I wanted. I threw it at our main @HyperWriteAI repo, too, and it crushed that as well (this was especially impressive, as that repo is many years in the making, with tons of dead and confusing code that a model needs to navigate). A major reason GPT-5 changed things so drastically for me isn’t just the improved capability. GPT-5 is fast. Even if it was only as good as o3, but this much faster, it’d be transformative. The fact that it’s both smarter on most prompts and lightning-fast just puts it in a completely different category. Most tasks returned results in seconds; the longest prompts rarely exceeded a minute. That speed means I stay in flow… less downtime, less waiting, fewer mental context switches. It feels fluid in a way that completely changes my workflow. There are still nuances and annoyances, though. For example, GPT-5 is oddly sensitive to prompting structure, especially when building complex prompts using tools like RepoPrompt. Early on, it sometimes went off the rails, ignoring my instructions and making unrelated edits. I eventually figured out a simple fix: explicitly repeating key instructions at the top of the prompt reliably solves that problem View example. It’s a straightforward workaround, but it’s important to note. Hopefully the OpenAI team patches this up with a new snapshot soon. Another small annoyance: GPT-5 is overly eager at the end of conversations. I might ask something simple, like a quick weather check, and it’ll tack on some extra question like, “Want me to create a comprehensive plan for your day?” It’s harmless, but for power users, more than a little irritating. Auto, Thinking, and Pro Modes GPT-5 offers three main modes. Auto is the default, and what most users should be using. It’s actually two models under the hood: one that answers immediately, and another that thinks before responding. There’s a classifier that decides which one to use based on the prompt you give it. Then there’s Thinking, which is what I’m using almost exclusively now. It bypasses the classifier and uses the Thinking version of the model for every prompt. This mode is slower (though it’s still quite fast compared to the competition), but it’s where the real magic happens when you’re doing something complex or creative. Finally, there’s Pro, which is the most advanced mode. I haven’t been granted access to it, so I’ll only speculate on its capabilities. It’s likely similar in spirit to o3 Pro mode, which (also speculatively) runs multiple o3 instances in parallel, and uses some kind of ensemble approach to combine their outputs into a single, best-possible response. Based on how much better o3 Pro is compared to standard o3, I wouldn’t be surprised if Pro mode in GPT-5 is similarly more capable. And honestly, based on my experience with GPT-5 so far, it’s hard to even imagine what kind of capabilities/reliability Pro mode would unlock. API Pricing For those building on GPT-5, the pricing is as follows: - Input: $1.25 per million tokens (with a 90% cache discount, which is a big deal for long-context queries) - Output: $10 per million tokens This is cheaper than GPT-4o, which is fantastic. Intelligence per dollar continues to increase. Note: OpenAI is also offering Mini (smaller) and Nano (smallest) variants of GPT-5, which are cheaper but less capable. I haven't tested these, so I won't comment on them. Where GPT-5 Falls Short For explicit search tasks, I still prefer o3. Why? GPT-5 stops digging sooner. For example, I was trying to have GPT-5 find the hometown of a public figure. It only found the city, and stopped there. I needed to prompt it multiple times to get it to actually look deeper and find the specific town. o3, on the other hand, will just keep digging until it finds what you need. This isn’t a deal-breaker for me, but it’s something to keep in mind if you rely heavily on models for research. On the other hand, when it comes to implicit research, like mid-task documentation lookups or quick library checks during coding, GPT-5 clearly outperforms o3. On emotional or sensitive tasks, like crafting difficult emails or strategizing conversations, I still strongly prefer GPT-4.5. I use it with my specialized thinking prompt (try it here). GPT-4.5 still wins by far on tone, subtlety, humor, and persuasion. I’ve also noticed that GPT-5 does struggle a bit with instruction following. It’s not terrible, but you still need to be very careful with how you phrase and structure your prompts if you want the best results. I may be wrong, but it feels like while GPT-5 has big model capability, it has small model smell. Between its insane speed, weakness in creative writing and emotional tasks, sensitivity to prompting, and odd failure modes, I just have a feeling that the actual size of GPT-5 is much smaller than people expected. If this is the case, it’s almost more impressive overall due to just how capable of a model it is. This shouldn’t dissuade you from using it, this is just something I’ve felt and noticed throughout my testing. Long-Context Handling Here’s something unexpected, especially given my suspicions around the model’s size: GPT-5 is incredibly good at maintaining consistency over very, very long coding sessions. I’ve worked with prompts likely spanning hundreds of thousands of tokens. It consistently maintains context insanely well. This feels far better than Gemini 2.5 Pro at long-context handling (though, I was accessing the model through the ChatGPT interface, so there's a chance OpenAI is doing something on top of the model). I didn’t realize how valuable that was until I experienced it directly. It is a true step up for deep, long-term coding sessions. That context retention shows up as meticulous attention to small details over long sessions. GPT-5, even when pushed into big, messy codebases, maintained a clear understanding of the architecture, file organization, and project context, which previous models often struggled to do without constant reminders. It didn’t seem to get “dumber” as the context window grew… often, it even seemed to improve, becoming more aware of the project’s overall structure and how the pieces fit together. This is the new standard, and there’s no way I’m going back to anything else. I Was Wrong. I’m Happily Eating My Words. All of this comes with a bigger-picture implication. GPT-5 is a true leap. I genuinely think the rest of the industry is going to have to sprint now. Labs releasing other models or coding platforms need to pay attention: developers are going to shift to GPT-5 quickly. The combination of autonomy and speed is a major unlock. Teams using GPT-5 will out-ship teams that don’t. If you’re building around these models, this is your opportunity to 10x your product. If you’re a VC, pay close attention: adoption curves of GPT-5-powered teams will be visible in how quickly they build and ship products. Expect a noticeable shift in market dynamics. And most importantly, as with every jump in model intelligence, new use-cases will become possible, and new companies will emerge to capitalize on them. You can bet that I’ve already found a couple of these use-cases and will be keeping them close to my chest for now, with the aim of building something new around them. It’s exciting to say the least. Bottom line, GPT-5 isn’t just going to improve vibe coding, it will fundamentally change the kinds of projects I consider doable without serious human intervention and steering. This past week, it turned what I confidently thought was a multi-month engineering challenge into a casual one-hour sprint. This is serious, real, autonomous software engineering.

108

877

141,346