NVIDIA just dropped benchmarks showing 4-bit inference loses less than 1 point vs BF16 on most tasks.
It's not accuracy per request that you should be measuring. It's tasks completed per dollar. And at that metric, 4-bit wins by a landslide.
Read the full blog 👇
7/ Layout algebra is formalized in Lean 4. 26 theorems, 0 sorry.
Properties extracted to RapidCheck tests.
The art/ directory has 23 SVG visualizations - we drew pictures until we understood.
💿 Open Source Release 💿
mdspan-cute: a zero-overhead bridge between C 23 std::mdspan and CUTLASS cute layouts.
One header. Swizzled memory. No bank conflicts.
Read the blog and check out the repo (links in reply)
6/ On "bit augmentation":
Log/exp is a bijection. Information in = information out.
You can't create precision from a reversible transformation.
Thermodynamics doesn't allow it.
1/Yesterday we announced nix2gpu -
a NixOS package for portable GPU containers.
Portable containers prevent vendor inference lock-in.
Here's why it's a big deal. #Nix#AIInfra
7/ Why it matters:
Makes distributed GPU compute easy and deterministic.
Philosophy: It's just Linux with libs - complexity is optional.
Open-source, MIT-licensed; production-tested on Fleek machines.