Varun tej

Varun tej

Users
Tweets

Varun tej

@itsvaruntej

Jun 12

For the last few weeks I've been understanding NKI, AWS's kernel language for writing custom ops on Trainium and Inferentia. just submitted my first kernel to the nki-samples repo: a decode-step attention kernel with GQA. A few things that clicked along the way: >Firstly, Decode attention is memory-bound, not compute-bound. Generating one token is tiny math (a single query), but it re-reads the entire growing kV cache every step. So the whole design is about touching K/V memory as few times as possible, not about FLOP. > Grouped-query attention is a direct win here. When several query heads share one KV head, you load that K/V tile once and let the whole group ride on it. On a memory-bound kernel that saves exactly the thing that costs you. >Online softmax is what lets you stream the KV cache in tiles instead of holding every logit at once. You carry a running max, denominator, and accumulator across tiles and rebase as each new tile arrives. Same answer, bounded memory. > The hardware model is genuinely different coming from CUDA: matmul results can only exit through PSUM (a tiny accumulator), so you immediately evacuate to SBUF to free it for the next matmul and to let the softmax engines read it. It is validated on CPU against a NumPy reference, not yet on real Neuron hardware (just not yet). > Next up: the split-KV flash-decoding variant for long context. If you work on Neuron, NKI, or inference kernels, I'd love feedback on the approach. #AWSNeuron #Trainium #Inferentia #MLSystems #NKI #KernelProgramming #Kernels #DecodeAttentionWithGQA

Ulrich Ntella

Ulrich Ntella @ulrichntella

4 Mar 2024

I've ordered this book and I'm eagerly anticipating its arrival so I can delve into it and master its content. #Linux #KernelProgramming

Linux Handbook

@LinuxHandbook

4 Mar 2024

Interested in Linux kernel development? This is an excellent book with hands-on approach and challenge exercises 🖥️ Second edition of the book has been released recently Check it out👉 packt.link/YitS1

261

Share Learn

Share Learn @sharelearn_net

24 May 2023

📢 Check out this video on eBPF! 🎥✨ Discover the secrets of kernel programming with eBPF, 🐧💻 Unleash your coding skills and dive into observability, networking, load balancing 🚀🌐 #eBPF #KernelProgramming #Observability #Networking #LoadBalancing youtu.be/Fr508E-Plqw

Shashank Gosavi

Shashank Gosavi @shawshank730

5 Jul 2019

Another attempt of writing Windows Kernel Driver. This time it is with help of @zodiacon (@zodiacon) 's book : Windows Kernel Programming. It is very detailed and easy to understand book. #0xdarkvortex #windows10 #kernelprogramming lnkd.in/fFdnxMu

Shashank Gosavi

Shashank Gosavi @shawshank730

4 Jul 2019

Hello to the world of Windows Kernel Programming. 😅😅😅 Written first Windows Driver with the help of MS tutorial. It was frustrating but fun. Thanks @NinjaParanoid for motivation. #0xDarkVortex #darkvortex #kernelprogramming #windows10 #lowlevelstuff lnkd.in/eu5uKd2

Enric Balletbo Serra

Enric Balletbo Serra @eballetbo

22 Nov 2018

Hack the Linux Kernel can be easy. Join with us tomorow to the first Kernel Peer Lab in Barcelona. Doesn't matter your level, if you are interested just come and say hello! #kernelprogramming #opensourcesoftware #linuxkernel mtk.bcnfs.org/doku.php?id=ba…

Mikal/Meeh

Mikal/Meeh @mikalv

11 Jul 2017

Building the XNU kernel on Mac OS X Sierra (10.12.X) - 0xcc.re/building-xnu-kernel-… #osx #kernelspace #programming #kernelprogramming

Angela Barreto

Angela Barreto @AngelaHRUK

7 Jul 2017

Browser/Systems Dev jobs @Bromium #chromiumdev #systemsprogramming #filesystem #kernelprogramming #microvm #hypervisor #windowsinternals

Maestro

Maestro @i_maestr_o

24 May 2017

Linux Kernel Modules - Where Coders Rock !!! codersrockcorner.blogspot.in… #kernelprogramming #kernel #100DaysOfCode #301DaysOfCode #hacking