Strace

Strace

275 Photos and videos

Tweets

Strace

@straceX

you start GPU programming thinking you'll spend your time writing brilliant CUDA kernels and optimizing matrix multiplies. you end up spending 90% of your career staring at nsight compute timelines trying to figure out why your L1 cache hit rate dropped by 4%.

904

Strace

Strace retweeted

Strace

@straceX

Jun 12

we named it random access memory (RAM). then we built three levels of cache, prefetchers, data-oriented design, and an entire performance-engineering discipline whose whole purpose is making sure nobody accesses it randomly.

2,426

89,957

Strace

Strace

@straceX

branch prediction, register renaming, speculative execution, store buffers, μop translation. a huge amount of modern CPU complexity exists to maintain a simple illusion: that your instructions run in order, one at a time, like the ISA says. you write x86. the CPU translates it into something else and gets on with its day.

3,194

Strace

Strace retweeted

Strace

@straceX

Jun 13

C 23 std::expected might be one of the most practical additions to the language in years. It provides a structured way to handle errors without exceptions or output parameters. the caller can't accidentally ignore the possibility of failure, and there's no exception driven control flow involved.

161

10,621

Strace

Strace

@straceX

22h

the cost of a virtual call was never the vtable lookup. that's just a pointer load, and modern CPUs are very good at loads. what actually matters is the indirect branch that comes after it. the CPU wants to know where execution goes next so it can keep speculating ahead. When the branch predictor gets that target right, a virtual call is often surprisingly cheap. the trouble starts when the same call site sees many different targets. prediction becomes harder, mispredictions increase, and the CPU has to throw away speculative work and start again. that's why a monomorphic virtual call site can be almost free while a megamorphic one can become expensive. virtual calls are slow was always an incomplete explanation. the real question is how predictable the call target is.

3,330

Strace

Strace

@straceX

Jun 13

imagine telling Intel in 2001 that the thing most likely to kill Itanium wasn't a new architecture. It was x86 itself. Itanium wasn't a bad processor. It was a processor built for a future that never arrived. the compiler revolution it depended on never happened at the scale required, while out-of-order execution kept getting faster every generation. by the time the industry had its answer, x86-64 had already won.

Strace

@straceX

Jun 12

Itanium (2001–2021). a processor built on one beautiful idea: the compiler would schedule everything statically at build time. it asked the compiler to predict the future. the compiler declined. out-of-order CPUs predicted it at runtime instead, and won. survived by x86-64, the architecture it was supposed to replace.

242

22,136

Strace

Strace

@straceX

Jun 13

Roughly two-thirds of the Linux kernel is device drivers. The operating system part is smaller than most people think. Linux's real advantage isn't elegance it's decades of accumulated hardware support. linux doesn't just run on a lot of hardware. In many ways, Linux is the hardware support. The OS comes along for the ride.

185

6,744

Strace

Strace

@straceX

Jun 12

a simple benchmark showing memory hierarchy latency. the code performs the same dependent pointer chase at four working-set sizes. the only thing that changes is whether the data fits in L1, L2, L3, or spills to DRAM. on my machine: 1.5ns → 5.1ns → 35ns → 134ns per load.

123

4,207

Strace

Strace

@straceX

Jun 12

510

56,318

Strace

Strace

@straceX

Jun 12

the cost of a virtual call was never the vtable lookup. a load is a load. the cost is the indirect branch the predictor can't confidently predict and modern CPUs are speculating hundreds of instructions past it. guess wrong, and all that work gets flushed. a monomorphic call site is nearly free. a megamorphic one is a pipeline bonfire. virtual is slow was always missing the noun. the noun is branch prediction.

3,288

Strace

Strace

@straceX

Jun 11

Stages of lock-free programming: 1. it works 2. it works on x86 3. you learn x86 was hiding your ordering bugs the whole time 4. the ARM port ships 5. you finally read what acquire and release actually promise

627

21,795

Strace

Strace

@straceX

Jun 11

PIDs get recycled, pidfds don't. If you kill() a PID you saved twenty minutes ago, you might be signaling an entirely different process. a pidfd is a stable handle to that specific process even after PID reuse. one of those linux features that feels obvious the moment you learn it.

241

12,049

Strace

Strace

@straceX

Jun 11

i somehow missed that C23 added checked integer arithmetic. would have saved me from writing the same overflow helper for the 20th time.

4,921

Strace

Strace

@straceX

Jun 11

Learn C

Strace

@straceX

Jun 10

The first 5 programming languages you should learn: 1. C 2. C 3. C 4. C 5. C

1,734

Strace

Strace

@straceX

Jun 11

the human eye has become the most overclocked component in the tech industry

LaurieWired

@lauriewired

Jun 11

it's funny how much people care about 4K when the majority of actual movie theaters are barely above 1080p

842

Strace

Strace

@straceX

Jun 10

The longer a developer stays at a company, the more honest the comments become.

4,400

Strace

Strace

@straceX

Jun 10

Countries ignored oil. They fell behind. Countries ignored the internet. They fell behind. Ignore AI? The consequences could be much worse.

Polymarket

@Polymarket

Jun 10

NEW: Dario Amodei warns countries without powerful AI could be like medieval swordsmen facing World War II Marines.

450

Strace

Strace

@straceX

Jun 10

The first 5 programming languages you should learn: 1. C 2. C 3. C 4. C 5. C

177

8,552