C is Not a Low-Level Language
That sounds pretty scary, doesn’t it! C is the lowest-level language most of us know. If it isn’t “low-level” enough, then what is?
This is a really cool article because it invites us to take a close look at what “low-level language” means. If we think about it, a low-level language should be “close to the metal.” It should have as few abstractions as possible between the programmer and the computer.
Now, C feels like it’s close to the metal. After all, you’re manipulating memory directly with primitive operators and none of those fancy abstractions you’d find in a more modern language. And that’s because C was close to the metal–50 years ago. C was designed as a truly low-level language for DEC’s PDP-11, one of the most popular computers of the 1970s and 1980s. But computers have changed a lot since then!
The first fundamental change in how computers work is instruction-level parallelism. C assumes that all instructions in one thread of your program execute sequentially in order. But that hasn’t been true for years! Modern CPUs are tremendously sophisticated parallel processors that inspect adjacent instructions and try to run as many as possible in parallel, even speculatively executing instructions that may never run. And most modern CPUs contain many such processors, so fast programs must be truly parallel, with many threads–something C doesn’t make easy.
The other fundamental changes in how computers work is hierarchical memory. C assumes that memory is a big array that you can randomly index into. But modern architectures have multiple layers of cache between memory and the CPU, and random indexes into memory are slow. To make C run fast, you have to be aware of how the cache works and use it as much as possible, but C provides absolutely no help with this.
So how is C so fast if it isn’t a low-level language? The answer is tremendously sophisticated compilers. A modern compiler, like Clang, runs to 2 million lines of code that was built by hundreds of developers over decades. All this code is necessary to perform dozens of optimizations to fundamentally change how your C code executes so it can be fast on a modern processor, while keeping (roughly) the semantics you want. The resulting machine code is extremely fast, but almost unrecognizable compared to your original program. And if you want the fastest possible code, you have to deeply understand what these optimizations do and how your computer really works–because your computer is not a fast PDP-11, and C is not a low level language.