Andrej Karpathy just sat down and built GPT from scratch, line by line, in 2 hours.
For Free. From the man who co-founded OpenAI.
This video is enough to become an AI engineer.
Bookmark it. Watch it tonight. Build your own GPT this week.
$5,000. $15,000. $40,000.
That's what bootcamps charge to teach less than what's in this 2-hour video.
This video fixes that this week.
Follow
@codewithimanshu for more high-signal AI content that actually moves your engineering career forward.
↓
Karpathy doesn't explain GPT. He builds it.
Live. From "Attention is All You Need" the original paper. To the same architecture powering GPT-5.
Founding member of OpenAI in 2015. Senior Director of AI at Tesla. Now running Eureka Labs.
He's not teaching you how to use GPT. He's teaching you how it actually works at the source code level.
Most engineers will never understand transformers this deeply. The ones who do build the next generation of AI products.
Follow
@codewithimanshu for breakdowns of every must-watch AI lecture worth your time.
↓
Here's what gets built in 2 hours. No fluff.
Tokenization and data loading.
The foundation of every modern LLM. Train/val splits done right. Batch loaders that don't break in production.
Most tutorials skip this. You can't ship anything serious without it.
The bigram baseline.
The simplest possible language model. Karpathy builds it first because it teaches you what every fancier model is actually trying to improve.
Once you understand bigrams, transformers become obvious. Skip this and the rest never clicks.
Follow
@codewithimanshu for daily breakdowns of what AI engineers actually need to know.
↓
Self-attention. From scratch. Live.
This is the section that should have its own course.
Karpathy builds self-attention in 4 versions:
> Version 1: averaging past context with for loops
> Version 2: matrix multiply as weighted aggregation
> Version 3: adding softmax
> Version 4: full self-attention
Each version teaches you why the next one exists. Why attention works. Why matrix math replaces explicit loops. Why scaling matters.
You'll never look at "attention is all you need" the same way again.
Follow
@codewithimanshu for production transformer breakdowns weekly.
↓
The 6 attention notes that change everything.
Karpathy drops 6 insights most engineers never hear:
> Attention as communication between tokens
> Attention has no notion of space, operates over sets
> No communication across batch dimension
> Encoder blocks vs decoder blocks
> Attention vs self-attention vs cross-attention
> Why we divide by sqrt(head_size)
Each one of these explains a different failure mode in production AI systems.
Most "AI engineers" can't answer these. The ones who can charge $300K.
Follow
@codewithimanshu for the engineering insights that turn into job offers.
↓
Building the full transformer block.
Single self-attention head. Then multi-headed self-attention.
Feedforward layers. Residual connections. LayerNorm.
Each piece added with the reason it exists. Why residuals stop the model from collapsing. Why LayerNorm replaced BatchNorm. Why dropout matters at scale.
This is the architectural understanding that lets you debug any modern AI system.
Once you've built one transformer by hand, every paper you read becomes 10x clearer.
Follow
@codewithimanshu for transformer architecture content every week.
↓
Scaling up to a real model.
Karpathy goes from baseline to a working GPT.
Hyperparameters. Dropout. Model dimensions. The exact tradeoffs every production model makes.
By the end you have a Shakespeare-generating language model running on your machine. From scratch. Built by you. Understood by you.
That's not a tutorial. That's an architectural unlock.
Follow
@codewithimanshu for production model scaling breakdowns.
↓
Encoder vs decoder vs both.
The architecture choice that defines every modern AI product.
Why GPT is decoder-only. Why BERT is encoder-only. Why translation models use both.
Once you understand this, you can read any AI paper and immediately know what kind of system you're looking at.
This is the difference between someone who follows AI hype and someone who builds it.
Follow
@codewithimanshu for AI architecture deep dives weekly.
↓
NanoGPT walkthrough.
Karpathy ends with a quick walk through nanoGPT. The repo every serious AI engineer has cloned at least once.
Batched multi-headed self-attention. Production-grade code. The clean version of everything you just built.
This is the bridge from "I built a toy GPT" to "I can read and modify production AI code."
Follow
@codewithimanshu for repos every AI engineer should know.
↓
ChatGPT, pretraining, finetuning, RLHF.
The video closes with the full lineage. From your toy GPT to ChatGPT.
What changes when you scale up. Why RLHF matters. The exact path from research model to product.
You finish the video understanding the entire stack from raw paper to deployed product.
Most "AI experts" can't draw this map. After 2 hours, you can.
↓
What you'll be able to do after this.
Read "Attention is All You Need" and understand every line.
Debug attention layers when they break in production.
Build a custom language model on your own dataset.
Modify transformer architectures for specific use cases.
Have technical conversations with AI engineers without faking it.
Train a GPT on any data you want. Shakespeare. Code. Your own writing.
That's not "AI literacy." That's the foundation of an AI engineering career.
The kind of foundation that turns into senior roles and consulting contracts most people will never access.
↓
2 hours. Free. From the engineer who built it.
You'll spend longer in meetings this week and learn nothing.
This compounds for the rest of your career.
People who watch it can build GPT from scratch by Friday.
People who skip it stay confused about why their prompts fail in production.
Save the video. Watch it this week. Build something with the knowledge by the weekend.
Follow
@codewithimanshu for more high-signal AI content from the people actually building the future.