The Dirty Little Secret of AI:
I wanted to see if I could train a full neural network on a real 1979 PDP-11. Spoiler alert - I did.
Allow me to explain transformers and attention when they're reduced to their most basic forms, all in 6K of program code...