> be me, NVIDIA in 2006
> making GPUs so gamers can play World of Warcraft in HD
> Jensen Huang has a vision
> "what if we made them do... math?"
> entire boardroom thinks he's finally lost it
> create CUDA
> a software prison so elegant, academics will volunteer to be inmates
> gamers are confused. "Will this make Crysis run faster?"
> Jensen: "...no :)"
> fast forward to 2012
> AI researchers, hopped up on adderall and free pizza, discover they need a shitload of matrix multiplication to make cat pictures
> their Intel Xeons are crying, smoking, and filing for divorce
> they discover CUDA
> "holy shit this math thing you made is perfect"
> the entire field of AI becomes a subsidiary of a company known for making Fortnite run better
> we see this and our eyes turn into dollar signs
> decide the prison needs better amenities
> invent Tensor Cores
> "what if part of the chip did ONLY the AI math, but did it at ludicrous speed?"
> it's like adding a nitro booster to a Honda Civic specifically for going to the grocery store
> AMD's competing card is over there trying to do the same math with a abacus and hope
> because researchers are so lazy they can't be trusted to write a for-loop
> just give them a big red "MAKE AI" button that runs our code
> create cuDNN
> bake it directly into TensorFlow and PyTorch
> ecosystem lock-in is complete. We own the land, the factory, and the souls of the workers.
> a bunch of hippies from a thing called "OpenAI" call
> they want to build AGI or something
> they need compute
> roll up to their office, which is probably a converted garage, with the world's first DGX-1
> it's like the monolith from Space Odyssey
> I tell them it has 8 GPUs connected by NVLink, so they can gossip like schoolgirls at light speed
> they ask what it does
> "It does AI, idiots. Just plug it in."
> they sign the papers.
> But we're not done. The prison needs a better yard.
> See them trying to connect our beautiful DGX boxes with CAT-5 cables
> embarrassing.
> Acquire Mellanox for their InfiniBand tech
> Now the nervous system connecting the GPUs is also ours
> NVSwitch makes an entire rack of GPUs hold hands so tight they become one colossal mega-brain
> Meanwhile, in a dimly lit basement in Santa Clara... AMD
> They drop a new GPU. "The AMD Instinct MI300X! We have more theoretical FLOPs!"
> "Better Benchmark performance."
> One brave researcher tries to use it.
> spends 6 weeks trying to install ROCm, their knock-off
wish.com version of CUDA
> gets 400 errors, a kernel panic, and divorce papers from his wife
> he finally gets a model to run at half the promised speed
> The benchmark was for a matrix size of 1x1
> He throws the card in the trash and buys an NVIDIA H100 on the company card. The cost is irrelevant.
> Sanity is priceless.
> Jensen takes the stage. The leather jacket has its own gravitational pull. Unleashes the GB200 NVL72
> 72 GPUs and 36 Grace CPUs in a single NVLink domain with 130 TB/s of bandwidth
> the GPUs share thoughts before they even have them.
> 2025
> AMD comes out with their "rack scale" solution, the MI355X 128-GPU rack
> It's just 16 groups of 8 GPUs duct-taped together with Ethernet
> A scale out clown car
> We announce Rubin CPX GPU enabling cheaper prefill during inference
> saving $$$$
> it's the ultimate vendor lock-in
> every other chip designer sees this and has to throw their architecture in the trash and go back to the drawing board
> All hail the leather-clad prophet.