This is still the best* introduction to GPTs I have ever read.
jaykmody.com/blog/gpt-from-s…
Fills in all the gaps between Attention is All You Need right up to coding every single line of a working GPT2 implementation yourself. Even ends with explanations of finetuning for classification, summarization, instruction tuning, and PEFT. Bravo @jaykmody and all who take the time to write these things down in fully commented python rather than cryptic latex.
*best WRITTEN introduction. we all know who has the best video intro to GPTs…
Late to the party but "GPT in 60 Lines of NumPy" / picoGPT is nicely done: jaykmody.com/blog/gpt-from-s…
- good supporting links/pointers
- flexes some of the benefits of JAX: 1) trivial to port numpy -> jax.numpy, 2) get gradients, 3) batch with jax.vmap
- inferences gpt-2 checkpoints