Computer Architecture, Semiconductors, Systems Programming & I care about what the hardware actually does.

Joined June 2024
275 Photos and videos
you start GPU programming thinking you'll spend your time writing brilliant CUDA kernels and optimizing matrix multiplies. you end up spending 90% of your career staring at nsight compute timelines trying to figure out why your L1 cache hit rate dropped by 4%.
1
8
42
904
Strace retweeted
Jun 12
we named it random access memory (RAM). then we built three levels of cache, prefetchers, data-oriented design, and an entire performance-engineering discipline whose whole purpose is making sure nobody accesses it randomly.
35
88
2,426
89,957
branch prediction, register renaming, speculative execution, store buffers, μop translation. a huge amount of modern CPU complexity exists to maintain a simple illusion: that your instructions run in order, one at a time, like the ISA says. you write x86. the CPU translates it into something else and gets on with its day.
9
7
91
3,194
Strace retweeted
Jun 13
C 23 std::expected might be one of the most practical additions to the language in years. It provides a structured way to handle errors without exceptions or output parameters. the caller can't accidentally ignore the possibility of failure, and there's no exception driven control flow involved.
28
17
161
10,621
the cost of a virtual call was never the vtable lookup. that's just a pointer load, and modern CPUs are very good at loads. what actually matters is the indirect branch that comes after it. the CPU wants to know where execution goes next so it can keep speculating ahead. When the branch predictor gets that target right, a virtual call is often surprisingly cheap. the trouble starts when the same call site sees many different targets. prediction becomes harder, mispredictions increase, and the CPU has to throw away speculative work and start again. that's why a monomorphic virtual call site can be almost free while a megamorphic one can become expensive. virtual calls are slow was always an incomplete explanation. the real question is how predictable the call target is.
4
8
63
3,330
Jun 13
imagine telling Intel in 2001 that the thing most likely to kill Itanium wasn't a new architecture. It was x86 itself. Itanium wasn't a bad processor. It was a processor built for a future that never arrived. the compiler revolution it depended on never happened at the scale required, while out-of-order execution kept getting faster every generation. by the time the industry had its answer, x86-64 had already won.
Jun 12
Itanium (2001–2021). a processor built on one beautiful idea: the compiler would schedule everything statically at build time. it asked the compiler to predict the future. the compiler declined. out-of-order CPUs predicted it at runtime instead, and won. survived by x86-64, the architecture it was supposed to replace.
11
15
242
22,136
Jun 13
Roughly two-thirds of the Linux kernel is device drivers. The operating system part is smaller than most people think. Linux's real advantage isn't elegance it's decades of accumulated hardware support. linux doesn't just run on a lot of hardware. In many ways, Linux is the hardware support. The OS comes along for the ride.
13
19
185
6,744
Jun 12
a simple benchmark showing memory hierarchy latency. the code performs the same dependent pointer chase at four working-set sizes. the only thing that changes is whether the data fits in L1, L2, L3, or spills to DRAM. on my machine: 1.5ns → 5.1ns → 35ns → 134ns per load.
4
12
123
4,207
Jun 12
Itanium (2001–2021). a processor built on one beautiful idea: the compiler would schedule everything statically at build time. it asked the compiler to predict the future. the compiler declined. out-of-order CPUs predicted it at runtime instead, and won. survived by x86-64, the architecture it was supposed to replace.
28
44
510
56,318
Jun 12
the cost of a virtual call was never the vtable lookup. a load is a load. the cost is the indirect branch the predictor can't confidently predict and modern CPUs are speculating hundreds of instructions past it. guess wrong, and all that work gets flushed. a monomorphic call site is nearly free. a megamorphic one is a pipeline bonfire. virtual is slow was always missing the noun. the noun is branch prediction.
6
3
55
3,288
Jun 11
Stages of lock-free programming: 1. it works 2. it works on x86 3. you learn x86 was hiding your ordering bugs the whole time 4. the ARM port ships 5. you finally read what acquire and release actually promise
14
31
627
21,795
Jun 11
PIDs get recycled, pidfds don't. If you kill() a PID you saved twenty minutes ago, you might be signaling an entirely different process. a pidfd is a stable handle to that specific process even after PID reuse. one of those linux features that feels obvious the moment you learn it.
5
6
241
12,049
Jun 11
i somehow missed that C23 added checked integer arithmetic. would have saved me from writing the same overflow helper for the 20th time.
9
7
93
4,921
Jun 11
Learn C
Jun 10
The first 5 programming languages you should learn: 1. C 2. C 3. C 4. C 5. C
4
2
23
1,734
Jun 11
the human eye has become the most overclocked component in the tech industry
it's funny how much people care about 4K when the majority of actual movie theaters are barely above 1080p
1
2
16
842
Jun 10
The longer a developer stays at a company, the more honest the comments become.
3
5
59
4,400
Jun 10
Countries ignored oil. They fell behind. Countries ignored the internet. They fell behind. Ignore AI? The consequences could be much worse.
NEW: Dario Amodei warns countries without powerful AI could be like medieval swordsmen facing World War II Marines.
1
2
9
450
Jun 10
The first 5 programming languages you should learn: 1. C 2. C 3. C 4. C 5. C
27
17
177
8,552