💥 This document suggests that next-generation Apple architecture chips will deliver at least 2× faster AI performance!
A bit late on this. If it is not obvious by now. Next-gen will obviously be at least 2x faster. Also, minor thing: fp8 fp8 with fp16 accumulator overflows. But hopefully it is not a clumsy 14-bit thingy. Anyway, there are more tricks about how to do this efficiently.