ricerzz

ricerzz

Photos and videos

Tweets

ricerzz @ricerrz

Apr 4

wonder what I can use these for 🔥

Alex Zhurkevich @cudagdb

Apr 3

Trtllmgen kernels are now open. Fastest prefill and decode kernels for our target workloads. We wrote these to win InferenceX, MLPerf, other benchmarks. Powering some of today’s top served models. Dive in, learn, use them, or level up your own. Enjoy. github.com/flashinfer-ai/fla…

ricerzz

ricerzz @ricerrz

Mar 27

started a small CUDA-first kernel library repo idea is basically: steal the parts of cuBLAS/cuDNN’s shape that make sense, keep it small, and see how far I can get without spending much on compute repo has the first scaffold up. GEMM first. github.com/rizerr2131/mini-k…

GitHub - rizerr2131/mini-kernel-lib

Contribute to rizerr2131/mini-kernel-lib development by creating an account on GitHub.

github.com

223

more replies

ricerzz

ricerzz @ricerrz

Mar 28

> mini-kernel-lib now has 2 FP32 GEMM ref paths, fused ReLU, 1-axis reductions, and direct NCHW conv2d forward, plus workspace checks, tests, benches, and opt-in GEMM > autotune. still reference-only; next step is real GPU kernels and tighter dispatch.

ricerzz

ricerzz @ricerrz

Mar 28

> Another pass on mini-kernel-lib: tightened CUDA/runtime fallback handling and pushed the current narrow CUDA slices further for FP32 GEMM, reduction, and conv. > Still need a CUDA-capable machine for full end-to-end validation