Filter
Exclude
Time range
-
Near
For the last few weeks I've been understanding NKI, AWS's kernel language for writing custom ops on Trainium and Inferentia. just submitted my first kernel to the nki-samples repo: a decode-step attention kernel with GQA. A few things that clicked along the way: >Firstly, Decode attention is memory-bound, not compute-bound. Generating one token is tiny math (a single query), but it re-reads the entire growing kV cache every step. So the whole design is about touching K/V memory as few times as possible, not about FLOP. > Grouped-query attention is a direct win here. When several query heads share one KV head, you load that K/V tile once and let the whole group ride on it. On a memory-bound kernel that saves exactly the thing that costs you. >Online softmax is what lets you stream the KV cache in tiles instead of holding every logit at once. You carry a running max, denominator, and accumulator across tiles and rebase as each new tile arrives. Same answer, bounded memory. > The hardware model is genuinely different coming from CUDA: matmul results can only exit through PSUM (a tiny accumulator), so you immediately evacuate to SBUF to free it for the next matmul and to let the softmax engines read it. It is validated on CPU against a NumPy reference, not yet on real Neuron hardware (just not yet). > Next up: the split-KV flash-decoding variant for long context. If you work on Neuron, NKI, or inference kernels, I'd love feedback on the approach. #AWSNeuron #Trainium #Inferentia #MLSystems #NKI #KernelProgramming #Kernels #DecodeAttentionWithGQA
1
33
I've ordered this book and I'm eagerly anticipating its arrival so I can delve into it and master its content. #Linux #KernelProgramming
Interested in Linux kernel development? This is an excellent book with hands-on approach and challenge exercises πŸ–₯️ Second edition of the book has been released recently Check it outπŸ‘‰ packt.link/YitS1
2
261
πŸ“’ Check out this video on eBPF! πŸŽ₯✨ Discover the secrets of kernel programming with eBPF, πŸ§πŸ’» Unleash your coding skills and dive into observability, networking, load balancing πŸš€πŸŒ #eBPF #KernelProgramming #Observability #Networking #LoadBalancing youtu.be/Fr508E-Plqw

2
20
Another attempt of writing Windows Kernel Driver. This time it is with help of @zodiacon (@zodiacon) 's book : Windows Kernel Programming. It is very detailed and easy to understand book. #0xdarkvortex #windows10 #kernelprogramming lnkd.in/fFdnxMu

4
17
Hello to the world of Windows Kernel Programming. πŸ˜…πŸ˜…πŸ˜… Written first Windows Driver with the help of MS tutorial. It was frustrating but fun. Thanks @NinjaParanoid for motivation. #0xDarkVortex #darkvortex #kernelprogramming #windows10 #lowlevelstuff lnkd.in/eu5uKd2

3
30
Hack the Linux Kernel can be easy. Join with us tomorow to the first Kernel Peer Lab in Barcelona. Doesn't matter your level, if you are interested just come and say hello! #kernelprogramming #opensourcesoftware #linuxkernel mtk.bcnfs.org/doku.php?id=ba…

2
5
11 Jul 2017
Building the XNU kernel on Mac OS X Sierra (10.12.X) - 0xcc.re/building-xnu-kernel-… #osx #kernelspace #programming #kernelprogramming

1
1