Standard Kernel Co.

Standard Kernel Co.

10 Photos and videos

Tweets

Standard Kernel Co.

@Standard_Kernel

Apr 27

We built a system combining program analysis and LLMs to transform and optimize PTX. By operating at this shared layer across DSLs (e.g. Triton, TileLang, ThunderKittens, CUTLASS), our system learns the best ideas from each and generates kernels that outperform all of them (1/5)

256

23,049

more replies

Standard Kernel Co.

Standard Kernel Co.

@Standard_Kernel

Apr 27

One exciting application: a universal optimization layer across DSLs. High-level DSLs (Triton, CUTLASS, TileLang, ThunderKittens) are powerful but opaque, as they don’t reveal why one outperforms another. By working at the shared PTX layer, we can compare, learn, and compose their best implementations into kernels that outperform them all. (4/5)

1,662

Standard Kernel Co.

Standard Kernel Co.

@Standard_Kernel

Apr 27

Full post at standardkernel.com/blog/reim… (5/5)

1,453

Jump Capital

Standard Kernel Co. retweeted

Jump Capital @jumpcapital

Mar 20

We have a utilization problem. GPUs are running <30% capacity. @Standard_Kernel (@anneouyang @ChrisRinard ) unlocks up to 4x performance. Why we invested: jumpcap.com/insights/why-we-…

2,215

Jump Capital

Standard Kernel Co. retweeted

Jump Capital @jumpcapital

Mar 16

AI progress increasingly depends on how efficiently workloads run on hardware. @Standard_Kernel is tackling this challenge at the kernel level, unlocking more performance from modern GPUs. We're proud to lead their seed with @generalcatalyst, @CoreWeave, @felicis, & @ericsson

1,654

Saaya Nath Pal

Standard Kernel Co. retweeted

Saaya Nath Pal @saayanath

Mar 11

It’s rare to find founders so perfectly and uniquely suited to solve a problem, let alone a problem of this magnitude and importance. Proud to lead @Standard_Kernel’s seed round.

Anne Ouyang

@anneouyang

Mar 11

Excited to share @Standard_Kernel's seed round and some reflections on what we’ve learned about kernel generation and what we believe is next. Grateful to our amazing team, supporters, and the broader community pushing this space forward.

3,468

Dylan Patel

Standard Kernel Co. retweeted

Dylan Patel

@dylan522p

Mar 11

Anne is killing it. Here's my quote from the press release Kernel generation is key for improving performance and efficiency of AI hardware. As fleet sizes for users of AI hardware get larger, and more hardware diversity is introduced, Standard Kernel becomes key to deployment.”

Anne Ouyang

@anneouyang

Mar 11

289

60,848

Anne Ouyang

Standard Kernel Co. retweeted

Anne Ouyang

@anneouyang

Mar 11

518

134,929

Standard Kernel Co.

Standard Kernel Co.

@Standard_Kernel

Feb 24

GPU timing is deceptively hard. Power limits, thermal state, clock behavior, caching, and measurement method all affect results in subtle ways. We explored sources of timing variation to obtain more reliable results for kernel benchmarking. (1/9)

136

33,504

more replies

Standard Kernel Co.

Standard Kernel Co.

@Standard_Kernel

Feb 24

Warm caches can make GPU kernels appear much faster, and sensitivity to cold vs. warm cache varies by workload. Be explicit about which condition you measure, as the appropriate choice depends on the behavior you want to evaluate. (6/9)

1,444

Standard Kernel Co.

Standard Kernel Co.

@Standard_Kernel

Feb 24

The choice of timing method (CUDA events, CUDA graphs, Nsight Compute, PyTorch profiler, etc.) can result in different measured GPU performance, and the effect depends on the workload. For microsecond-scale kernels, true execution time is often indistinguishable from measurement overhead and system variance. (7/9)

589

Anne Ouyang

Standard Kernel Co. retweeted

Anne Ouyang

@anneouyang

Feb 24

Standard Kernel Co.

@Standard_Kernel

Feb 24

149

19,563

Standard Kernel Co.

Standard Kernel Co.

@Standard_Kernel

Feb 24

Compilation flags can change GPU performance on the same kernel hardware. Sensitivity varies by workload, so uncontrolled flags can look like real algorithmic gains when they're just compiler effects. (8/9)

1,133

Standard Kernel Co.

Standard Kernel Co.

@Standard_Kernel

Feb 24

Getting identical hardware from cloud providers is not guaranteed. Across three cloud providers all listing “A100 80GB,” we received different variants with differing clock limits, power caps, and driver environments. When benchmarking an identical GEMM, the runtime distributions formed distinct clusters for each provider. (9/9)

1,151