inference @nvidia | eng @westernu @schulichleaders | building and learning for fun

Joined February 2015
86 Photos and videos
Pinned Tweet
12 Nov 2025
hand-controlled boids
312
1,491
21,029
739,470
Xander Chin retweeted
This sentence from Carl Jung hits hard. “No matter how isolated you are and how lonely you feel, if you do your work truly and conscientiously, unknown friends will come and seek you.”
52
847
6,157
98,999
Xander Chin retweeted
May 25
Want to see distributed computing explained via Pong? Inspired by TinyTPU and TinyTapeout workshop at FOSSi, I wrote a paper under a week pairs this demo with a proposed next-gen optical I/O chip architecture & a roadmap to prototype it. Read it on GitHub: github.com/llhtimlam/tt_um_l…
7
10
106
21,940
Xander Chin retweeted
A few months ago, I saw Karpathy build NanoChat in PyTorch, and it made me want to understand how these models work underneath the abstractions. So I decided to try building one myself, but in a different framework: JAX. Here’s how I did it: 🧵
5
7
25
1,708
Xander Chin retweeted
reinventing Groq's LPU with @michael_trbo we got instruction driven data movement working between SRAM memory blocks and MXM compute!!
11
18
77
6,535
Xander Chin retweeted
We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000 tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇
266
696
7,508
851,204
Xander Chin retweeted
anyone subletting a 1 bedroom apartment in Toronto this summer?
4
5
29
4,777
Xander Chin retweeted
Replying to @satvikgari
@satvikgari and I have been building our own version of Nvidia’s Blackwell GPU. We just designed a 4x4 systolic array in Verilog! Here’s a breakdown of how it works and what we learned building it.
11
8
70
5,655
Xander Chin retweeted
Apr 3
bit late to the recruiting cycle, but looking for a summer internship in ML/hardware/inference!! i've been working on CUDA kernel writing, FPGA acceleration and RTL. would love to find a team doing similar work this summer dual US/Canada citizen, can relocate anywhere DMs open :)
36
14
264
30,842
Xander Chin retweeted
wrote an article breaking down the math behind TurboQuant by @GoogleResearch. I walk through a toy example using concrete numbers to show every single operation that goes on under the hood. link below:
30
115
927
76,373
seriously impressive stuff. give this man a follow
I implemented @GoogleResearch's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA loads Try it out: devtechjr.github.io/turboqua… s/o @blelbach and the cuTile team at @nvidia for lending me Blackwell GPU access :) cc @sundeep @GavinSherry
9
36
1,032
115,703
Xander Chin retweeted
Recently @arjunharinath1 and I started building our own version of Nvidia's Blackwell GPU. We built the ALUs and a 4-lane SIMD core in Verilog. Here is a breakdown of how we did it.
15
25
181
11,920
Xander Chin retweeted
so @sakshambatraa and I are working on re-inventing groq's LPU from scratch this last week we implemented the VXM, the LPU's arithmetic unit here's what we learned
9
20
229
12,587
Xander Chin retweeted
for my next adventure, @michael_trbo and I will be working together to build a tinyLPU! for our first checkpoint, we reinvented the MXM: the language processing unit's matrix multiplication engine. here's how we did it
10
19
96
6,364
Xander Chin retweeted
i think one of the problems w the current scene in tech is that a lot of people are attached with what comes with it. such as money, fame, etc. i don’t disagree that we all want to make money, but at some is attachment doing justice to yourself? if youre starting a project because you think it will help you land a certain job; you’re chaining yourself down to the rewards and not your duty. because lets say you don’t get the job you wanted, will you still cherish the project and the time you put into it? you’re entitled to your duty, but not the rewards that come with it.
1
4
24
1,654
Xander Chin retweeted
pre-symposium shenanigans
3
2
53
3,632
Xander Chin retweeted
from getting inspired by symposium last year to presenting on stage this year! live demoing tiny-tpu was insane. this year has been a wild adventure, and it's only getting started
23
17
130
9,262
Xander Chin retweeted
see how 4 university students reverse engineered Google's most advanced AI chip tomorrow
13
41
646
45,829
Xander Chin retweeted
Mar 21
cheering (and lol’ing) for the pals from tiny-tpu @socraticainfo. this story never gets old 🙌 @XanderChin @suryasure05 @kennykgguo @evanliin
6
7
112
4,968
Xander Chin retweeted
We built Talos - a full CNN inference engine running directly on silicon. Every multiply, buffer, and data path lives as real digital logic on the FPGA. This is what deep learning looks like when the model becomes hardware👇
47
109
1,202
92,184