Dropped a new landing page and announced Act 02: ML Systems. I plan on covering transformer architecture, flash attention, KV cache, speculative decoding, and more.
If you've ever wanted to actually understand how to run models fast on the hardware, this is for you.