Hand written RISC-V assembly code written by @AlibabaGroup Cloud submitted to FFmpeg
Up to 14 times faster than C.
It's great to see so many corporate contributors of hand written assembly, a field historically dominated by volunteers!
M3 Ultra Mac Studio Al Cluster
Uses EXO 1.0 with RDMA over Thunderbolt 5 (80Gbps) for 2TB unified memory @ 3.2TB/s.
Runs Kimi K2 Thinking (1T parameters) at 28 tok/sec with MLX tensor parallel. Up to 1.8x TPS w/ 2 Macs, 3.2x w/ 4 Macs.
EXO 1.0 is now open-source (Apache 2.0).
This demo runs AOT-compiled WebAssembly (Wasm) binaries of GNU Bash and BusyBox Linux applications directly in the browser!
You can see multiple processes running simultaneously in the browser.
Demo: yomaytk.github.io/elfconv-de…
Unwrapping TPUs Day 7 ! 🎄
Today we explore complete verilog design and testing of an inference chip in the TPU v1 architecture
It's been a great week learning theory & implementation w @Alan_Ma_ !