Joined July 2024
Photos and videos
You can't have portability AND peak performance. That's the assumption people make. They're right if you rely on portability by abstraction at runtime. But that assumes a cross-vendor system must be a "lowest-common-denominator" system.
1
1
1
9
Generality is free when your abstraction sits at the right level. Dive into Part 4 of my series, "Why hardware-agnostic isn't the same as lowest-common-denominator": scale-lang.com/posts/2026-06…

1
1
2
33
Have I not managed to convince you yet? Come hang out in our Discord to keep the debate going. It's where we talk compilers, GPU programming, and other nerdy things with our dev team: scale-lang.com/s/discord?utm…
3
18
Michael Søndergaard retweeted
𝗦𝗽𝗲𝗰𝘁𝗿𝗮𝗹 𝗖𝗼𝗺𝗽𝘂𝘁𝗲 𝗶𝘀 𝗻𝗼𝘄 𝗽𝗮𝗿𝘁 𝗼𝗳 𝘁𝗵𝗲 𝗡𝗩𝗜𝗗𝗜𝗔 𝗜𝗻𝗰𝗲𝗽𝘁𝗶𝗼𝗻 𝗽𝗿𝗼𝗴𝗿𝗮𝗺. 🟩 Inception is @nvidia's program for AI startups - a membership that gives access to technical resources, preferred pricing on NVIDIA hardware and software, and exposure to a global network of investors and partners. CUDA is the de-facto standard for AI developers, and we’re honored to play our part in growing the ecosystem.
1
3
7
109
Michael Søndergaard retweeted
From CUDA portability to real-world performance on @AMD GPUs, @SpectralMichael, CEO of @SpectralCom shares why open, portable AI infrastructure matters more than ever. Watch the short interview from Beyond Summit 2026 on our YouTube channel: youtube.com/watch?v=cwmiGYas…
4
2
15
3,545
Everyone says CUDA can't target TPUs. What they mean is nobody has written the compiler that raises CUDA code to something a systolic backend can consume. Those are very different sentences. Full post — Part 3 of why @SpectralCom exists: tinyurl.com/mtnxcjsw

2
4
502
Michael Søndergaard retweeted
𝗨𝗽 𝘁𝗼 𝟮𝟱.𝟳× 𝗳𝗮𝘀𝘁𝗲𝗿. Unmodified CUDA. AMD silicon. Thanks to the @tensorwave team for benchmarking SCALE on MI355X and publishing the numbers. Port to AMD used to mean a rewrite. Now it means a recompile.
Spectral Compute (@SpectralCom) used TensorWave’s AMD-native infrastructure to benchmark CUDA portability and performance on @AMD Instinct™ MI355X GPUs. See how they did it - tensorwave.com/blog/spectral…
2
7
160
Obvious take: if agents can write native code for any GPU, who needs a portable CUDA toolchain? Just point the model at each GPU. I think that's exactly backwards — and wrote up why. tinyurl.com/5ac6y5jt

3
148
CUDA isn't a standard. No consortium, no ratified spec. It's whatever NVIDIA ships next. That's supposed to be a weakness. It's why CUDA wins. The cross-vendor layer can't be a consortium either. It has to be a company. That's what we're building at @SpectralCom. SCALE compiles unmodified CUDA on AMD and NVIDIA today. No PDF to argue about — a toolchain. Full piece: [bit.ly/3PBsYsU](bit.ly/3PBsYsU)

2
4
111
Compiler Explorer godbolt.org via @CompileExplore would you like to add scale-lang.com to godbolt?
1
53
Michael Søndergaard retweeted
AI compute is bottlenecked. When the hardware mix inevitably diversifies, how will production clusters actually operate? Our CEO @SpectralMichael is joining @AkashBajwa96's Gradient Descending roundtable this Wednesday to dig into the evolving hardware landscape.
1
1
5
289
Michael Søndergaard retweeted
What does the AI cluster of the future look like? Hint: it won't just be a wall of B200s. Our CEO Michael Søndergaard is joining @AkashBajwa96 for the next Gradient Descending roundtable to discuss ASICs, non-NVIDIA chips, and breaking the software lock-in. 🗓️ Feb 25 | 8:30 AM GMT 🎟️ Request invite: luma.com/f7s69jj2 #AI #HPC #DeepTech #AMD #NVIDIA #ASIC
2
8
256