SemiAnalysis

SemiAnalysis

665 Photos and videos

Tweets

Kevin Markham retweeted

SemiAnalysis

@SemiAnalysis_

Jun 10

Recently, we purchased one of each Anthropic/OpenAI subscription plan and randomly ran long horizon coding tasks until we exhausted the weekly limit. It's widely believed that a $200/month plan maxes out at ~$2000/month worth of tokens (assuming API pricing). However, we found that the subscriptions are actually far more generous. (2/4)

190

574

6,041

3,463,315

Kyle 🚄

Kevin Markham retweeted

Kyle 🚄@KyleTrainEmoji

May 26

PICARD: Data, shields up DATA: Brilliant! Shields can reduce damage we sustain. Not immunity. Not hubris. Just prudence. It's not precaution—it's strategy. [camera shakes] WORF: HULL BREACHES ON NINE DECKS DATA: Here's what happened: you told me to raise shields, and I didn't

305

4,861

50,516

1,385,139

Kevin Markham

Kevin Markham @justmarkham

May 22

Is Machine Learning still worth learning in 2026? Or should you just rely on an LLM? My take: dataschool.io/is-machine-lea…

Is Machine Learning still worth learning in 2026?

8 reasons why there's still great value in learning how to do supervised Machine Learning yourself (rather than relying on an LLM)

dataschool.io

838

Kevin Markham

Kevin Markham @justmarkham

May 11

Anyone have a personal contact at a bootcamp or university where they teach #MachineLearning using #Python? I'd love to talk with them about incorporating my book into their curriculum! Feel free DM me their contact info 🙏 Read the book online (free!): mlbook.dataschool.io

1,817

Kevin Markham

Kevin Markham @justmarkham

May 5

Floored by the response to my new #Python #MachineLearning book 🤩 Paperback: geni.us/MasterML Ebook: courses.dataschool.io/ebook Read online (free!): mlbook.dataschool.io

2,307

Kevin Markham

Kevin Markham retweeted

Kevin Markham @justmarkham

Mar 13

My book is the #1 New Release in NLP! 🥳 Amazon US put it on sale... for $0.95 off 😂 Get the paperback: geni.us/MasterML Or read online (free!): mlbook.dataschool.io #MachineLearning #Python

3,283

Kevin Markham

Kevin Markham @justmarkham

Mar 6

Absolutely thrilled that my book is finally published! 🎉 Paperback: amazon.com/dp/B0GRFPZ768 ebook: courses.dataschool.io/ebook Read online: mlbook.dataschool.io/ Poured my heart & soul into this for 5 years Hopefully I sell a few copies even though you can read it for free 😂

113

4,815

Kevin Markham

Kevin Markham @justmarkham

Mar 5

The BEST course I took last year runs one FINAL time... and it starts in 4 days ⏰ You'll learn how to build production-ready AI apps from @hugobowne Includes LIVE instruction & talks from experts, plus $1300 in AI partner credits Enroll for 25% off: maven.com/hugo-stefan/buildi…

645

Kevin Markham

Kevin Markham @justmarkham

Feb 27

My new book - on sale NEXT WEEK! 🎉 Sign up to get notified when it's available: dataschool.kit.com/mlbook #MachineLearning #Python @scikit_learn

0:23

1,841

Kevin Markham

Kevin Markham @justmarkham

Feb 20

Final proof copy of my new #MachineLearning book 🎉 Get notified the moment it's available: dataschool.kit.com/mlbook

112

4,633

nader dabit

Kevin Markham retweeted

nader dabit

@dabit3

Feb 11

x.com/i/article/202134785065…

240

1,992

800,378

Simon Willison

Kevin Markham retweeted

Simon Willison

@simonw

31 Dec 2025

Here's my enormous round-up of everything we learned about LLMs in 2025 - the third in my annual series of reviews of the past twelve months simonwillison.net/2025/Dec/3… This year it's divided into 26 sections! This is the table of contents:

The year of “reasoning”
The year of agents
The year of coding agents and Claude Code
The year of LLMs on the command-line
The year of YOLO and the Normalization of Deviance
The year of $200/month subscriptions
The year of top-ranked Chinese open weight models
The year of long tasks
The year of prompt-driven image editing
The year models won gold in academic competitions
The year that Llama lost its way
The year that OpenAI lost their lead
The year of Gemini
The year of pelicans riding bicycles
The year I built 110 tools
The year of the snitch!
The year of vibe coding
The (only?) year of MCP
The year of alarmingly AI-enabled browsers
The year of the lethal trifecta
The year of programming on my phone
The year of conformance suites
The year local models got good, but cloud models got even better
The year of slop
The year that data centers got extremely unpopular
My own words of the year

ALT The year of “reasoning” The year of agents The year of coding agents and Claude Code The year of LLMs on the command-line The year of YOLO and the Normalization of Deviance The year of $200/month subscriptions The year of top-ranked Chinese open weight models The year of long tasks The year of prompt-driven image editing The year models won gold in academic competitions The year that Llama lost its way The year that OpenAI lost their lead The year of Gemini The year of pelicans riding bicycles The year I built 110 tools The year of the snitch! The year of vibe coding The (only?) year of MCP The year of alarmingly AI-enabled browsers The year of the lethal trifecta The year of programming on my phone The year of conformance suites The year local models got good, but cloud models got even better The year of slop The year that data centers got extremely unpopular My own words of the year

102

868

4,872

509,136

Trey Hunner

Kevin Markham retweeted

Trey Hunner @treyhunner

24 Nov 2025

Are you a #Python user and a lifelong learner? I've just published my 8th annual list of every Python-related Black Friday / Cyber Monday sale I'm aware of. treyhunner.com/2025/11/pytho…

2,725

Kevin Markham

Kevin Markham @justmarkham

5 Nov 2025

VIDEO: How to use top AI models on a budget Want to chat with the best AI models from OpenAI, Claude, and Google without paying $20/month? I'll show you how to use API keys w/ @TypingMindApp to access top models for a fraction of the cost! Find out how: youtube.com/watch?v=wvvTog-F…

1,148

Yuchen Jin

Kevin Markham retweeted

Yuchen Jin

@Yuchenj_UW

19 Oct 2025

I love Andrej’s clarification that the final AGI recipe includes an RL stage, but we still need new layers of breakthroughs to get there. AGI is still a research problem, not an engineering problem. Scaling compute 100× won’t magically make it happen. The lab that invents the next learning paradigm will define the future.

704

71,143

Lenny Rachitsky

Kevin Markham retweeted

Lenny Rachitsky

@lennysan

14 Oct 2025

Everyone should be using Claude Code more PMs, marketers, designers, founders, parents. Everyone. The trick is to forget that it’s called Claude Code and instead think of it as Claude Local or Claude Agent. It’s essentially a super-intelligent AI running locally, able to do stuff directly on your computer—from organizing your files and folders to brainstorming domain names, summarizing customer calls, to enhancing image quality, creating Linear tickets, and so much more. Here are 50 creative ways non-technical people are using Claude Code in their work and life, to inspire your own thinking. This list includes my own favorite use cases, and many examples y’all shared with me 👇

194

2,121

369,470

Andrej Karpathy

Kevin Markham retweeted

Andrej Karpathy

@karpathy

13 Oct 2025

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

683

3,352

24,135

5,808,891

Simon Willison

Kevin Markham retweeted

Simon Willison

@simonw

7 Oct 2025

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced What about when engineers at the top of their game use AI tools responsibly to accelerate their work? I propose "vibe engineering"!

383

162

2,117

293,852

Connor Davis

Kevin Markham retweeted

Connor Davis

@connordavis_ai

22 Sep 2025

This MIT paper just broke my brain. Everyone keeps saying LLMs can't do real logical reasoning. Turns out we've just been teaching them wrong this whole time. These researchers built something called PDDL-INSTRUCT that actually teaches models to think through planning problems step by step. Not just pattern matching - actual logical reasoning. Here's how it works: Phase 1: show the model correct and incorrect plans with explanations. Basic stuff. Phase 2 is where it gets interesting. They make the model generate explicit reasoning for every single action, then use an external verifier to check if each step is logically sound. The numbers are wild. Llama-3-8B jumped from 28% to 94% accuracy on planning benchmarks. That's not incremental improvement - that's a completely different capability emerging. What's smart is they don't trust the model to check its own work. They use VAL, a formal planning verifier, to validate every logical step. When the model screws up, it gets specific feedback about exactly what went wrong. The two-stage training is clever. First stage focuses purely on better reasoning chains. Second stage optimizes for actually solving the problem. This prevents the model from just gaming the metrics. One finding caught my attention - detailed feedback destroys binary feedback. Just telling a model "wrong" vs explaining exactly which preconditions failed makes a huge difference. The gap is especially big on complex problems. This isn't trying to replace symbolic planners. It's teaching neural networks to reason like symbolic planners while keeping external verification. That's actually sustainable. The implications go way beyond planning. Any multi-step reasoning task could benefit from this approach. We might finally be seeing how to teach LLMs structured thinking instead of just sophisticated autocomplete. Makes me wonder what other "impossible" capabilities are just sitting there waiting for the right training approach.

117

695

3,770

293,694