Joined April 2025
Photos and videos
wonder what I can use these for 🔥
Trtllmgen kernels are now open. Fastest prefill and decode kernels for our target workloads. We wrote these to win InferenceX, MLPerf, other benchmarks. Powering some of today’s top served models. Dive in, learn, use them, or level up your own. Enjoy. github.com/flashinfer-ai/fla…
2
76
started a small CUDA-first kernel library repo idea is basically: steal the parts of cuBLAS/cuDNN’s shape that make sense, keep it small, and see how far I can get without spending much on compute repo has the first scaffold up. GEMM first. github.com/rizerr2131/mini-k…
2
2
14
223
> mini-kernel-lib now has 2 FP32 GEMM ref paths, fused ReLU, 1-axis reductions, and direct NCHW conv2d forward, plus workspace checks, tests, benches, and opt-in GEMM > autotune. still reference-only; next step is real GPU kernels and tighter dispatch.
1
67
> Another pass on mini-kernel-lib: tightened CUDA/runtime fallback handling and pushed the current narrow CUDA slices further for FP32 GEMM, reduction, and conv. > Still need a CUDA-capable machine for full end-to-end validation
36