LLM explained in 60 seconds:
1. TOKENIZE
"I love coding" β ["I", " love", " cod", "ing"]
BPE splits text into digestible pieces.
2. EMBED
Each token β 4096 numbers.
Similar words = similar vectors.
"King" - "Man" "Woman" = "Queen"
3. ATTEND
Every word looks at every other word.
"It" in "The dog wagged its tail" β knows "it" = dog.
4. TRANSFORM
96 layers of processing.
Each layer = more understanding.
5. PREDICT
"The next word is probably..."
Softmax gives probabilities.
6. SAMPLE
Pick one. Add to response. Repeat.
Billions of parameters.
Trillions of operations.
One answer.
This is what happens when you hit Enter.