One exciting application: a universal optimization layer across DSLs. High-level DSLs (Triton, CUTLASS, TileLang, ThunderKittens) are powerful but opaque, as they don’t reveal why one outperforms another. By working at the shared PTX layer, we can compare, learn, and compose their best implementations into kernels that outperform them all. (4/5)