Nerds taming the green dragon with SCALE, our framework for compiling CUDA codebases for AMD GPUs, with support for more accelerated platforms coming soon.

Joined September 2025
64 Photos and videos
A lot of people assume that going cross-vendor means settling for lowest-common-denominator performance. @SpectralMichael breaks down why that's wrong in his latest blog post:
Replying to @SpectralMichael
Generality is free when your abstraction sits at the right level. Dive into Part 4 of my series, "Why hardware-agnostic isn't the same as lowest-common-denominator": scale-lang.com/posts/2026-06โ€ฆ
1
20
๐Ÿ‡ธ๐Ÿ‡ฌ Spectral is at @superai_conf 2026 โ€” Marina Bay Sands, Singapore, 10โ€“11 June. Singapore is one of the fastest-moving AI hubs in the world, and the conversations here about compute capacity and GPU vendor optionality are exactly the ones we care about. If you're around, let's talk.
1
22
Spectral Compute is at ISC High Performance 2026 โ€” booth C50, Hamburg, June 22โ€“26. ๐Ÿ“… Book a slot with us: scale-lang.com/s/booth-c50-aโ€ฆ

1
2
30
SCALE compiles unmodified CUDA code natively for AMD and NVIDIA. The numbers: โšก Up to 33.8ร— faster than HIP on AMD MI300X โšก Up to 9% faster than nvcc on NVIDIA B300 One codebase. Any GPU. No rewrites. See you in Hamburg ๐Ÿ‘‹
1
49
That's a wrap at @computex_taipei! Great few days talking about the future of heterogeneous accelerated computing.
1
1
2
41
Attendees were especially interested in the performance boost and better-than-native developer experience they can unlock with scale-lang.com - while expanding their compute capacity without being locked to a single vendor.

1
1
24
COMPUTEX always delivers on serendipity: the unplanned hallway chats turn into the most valuable ones. Heading home with a full notebook and a longer list of people to follow up with. Until next time. ๐Ÿ‘‹
1
24
Cross-architecture from a single codebase is exactly why we built SCALE. Thrilled to see @AtlasInference getting this running! More performance optimizations for both @AMD and @nvidia are on the way. scale-lang.com

Atlas Inference is running Qwen3.6-27B on AMD Strix Halo ๐Ÿฅณ Using @SpectralCom's SCALE ROCm backend, our CUDA kernels compile and run on RDNAโš™๏ธ Cross-architecture inference from ONE codebase ๐Ÿ—ฃ๏ธ Thank you @AIatAMD for the gift ๐Ÿ™ POC โœ… excited to keep tuning performanceโšก๏ธ
1
1
5
270
๐—ฆ๐—ฝ๐—ฒ๐—ฐ๐˜๐—ฟ๐—ฎ๐—น ๐—–๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜„ ๐—ฝ๐—ฎ๐—ฟ๐˜ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ก๐—ฉ๐—œ๐——๐—œ๐—” ๐—œ๐—ป๐—ฐ๐—ฒ๐—ฝ๐˜๐—ถ๐—ผ๐—ป ๐—ฝ๐—ฟ๐—ผ๐—ด๐—ฟ๐—ฎ๐—บ. ๐ŸŸฉ Inception is @nvidia's program for AI startups - a membership that gives access to technical resources, preferred pricing on NVIDIA hardware and software, and exposure to a global network of investors and partners. CUDA is the de-facto standard for AI developers, and weโ€™re honored to play our part in growing the ecosystem.
1
3
7
109
New benchmark numbers are in, and SCALE keeps pulling ahead.
1
1
42
And on NVIDIA's B300 (CUDA 13), SCALE lands within a whisker of nvcc on its home turf: ๐—ผ๐—ป ๐—ฝ๐—ฎ๐—ฟ ๐—ผ๐—ป ๐—ฎ๐˜ƒ๐—ฒ๐—ฟ๐—ฎ๐—ด๐—ฒ, ๐˜‚๐—ฝ ๐˜๐—ผ ๐Ÿต% ๐—ณ๐—ฎ๐˜€๐˜๐—ฒ๐—ฟ on individual workloads. That's native CUDA tooling, matched and occasionally beaten, by a third-party compiler.
1
1
30
Hardware freedom shouldn't cost you performance. Increasingly, it buys you some. Benchmarks here: tinyurl.com/2j8vftsr

1
17
Spectral Compute retweeted
Everyone says CUDA can't target TPUs. What they mean is nobody has written the compiler that raises CUDA code to something a systolic backend can consume. Those are very different sentences. Full post โ€” Part 3 of why @SpectralCom exists: tinyurl.com/mtnxcjsw

2
4
502
Replying to @ChrisKitching17
@ChrisKitching17, our CTO and co-founder of Spectral Compute, recently presented at the ๐—ก๐—›๐—ฅ ๐—ฃ๐—ฒ๐—ฟ๐—ณ๐—Ÿ๐—ฎ๐—ฏ ๐—ฆ๐—ฒ๐—บ๐—ถ๐—ป๐—ฎ๐—ฟ hosted by NHR@FAU. The recording is now online: youtu.be/uSLD40GX5nM
1
1
3
58
The Q&A at the end is worth the watch on its own. Attendees asked about: โ†ณ Whether optimized deep learning kernels flow through the same pipeline โ†ณ Support for Intel GPUs and emerging hardware โ†ณ How SCALE compares to NVIDIA's own compiler when targeting NVIDIA โ†ณ Adding custom accelerators (RISC-V came up) and emulating missing functionality โ†ณ Code-level transformations to improve occupancy on AMD โ†ณ Plans for OpenACC, CUDA Fortran, and OpenMP
1
2
40
๐—จ๐—ฝ ๐˜๐—ผ ๐Ÿฎ๐Ÿฑ.๐Ÿณร— ๐—ณ๐—ฎ๐˜€๐˜๐—ฒ๐—ฟ. Unmodified CUDA. AMD silicon. Thanks to the @tensorwave team for benchmarking SCALE on MI355X and publishing the numbers. Port to AMD used to mean a rewrite. Now it means a recompile.
Spectral Compute (@SpectralCom) used TensorWaveโ€™s AMD-native infrastructure to benchmark CUDA portability and performance on @AMD Instinctโ„ข MI355X GPUs. See how they did it - tensorwave.com/blog/spectralโ€ฆ
2
7
160
Sort the issues of almost any popular open-source CUDA project by most commented, and you'll inevitably find the exact same unresolved thread: 'Any chance of AMD support?'"
1
1
6
92
For most maintainers, that thread stays open indefinitely. Porting a complex CUDA codebase is a part-time job, and volunteer contributors rarely have one to spare. ๐—ฆ๐—–๐—”๐—Ÿ๐—˜ ๐˜๐˜‚๐—ฟ๐—ป๐˜€ ๐˜๐—ต๐—ฎ๐˜ ๐—ถ๐˜€๐˜€๐˜‚๐—ฒ ๐—ถ๐—ป๐˜๐—ผ ๐—ฎ ๐—ฟ๐—ฒ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ถ๐—น๐—ฒ.
1
2
26
Our compiler toolchain is free for research and evaluation. If you maintain a CUDA project and want to finally close that thread, come say hi: discord.com/invite/KNpgGbTc3โ€ฆ
2
26