"Avoid copying data to maximize performance."
This is standard advice. But in modern high-performance systems, premature optimization against copying can actually destroy your throughput.
Why? Because we often replace copies with Shared State Locks. And locks are the silent killer of modern CPUs. 🧵👇
When you use a Lock (Mutex), you aren't just waiting. You are triggering:
1. Cache line bouncing (CPU cores fighting for the lock variable)
2. Context switches (TLB flushes, saving registers)
3. Pipeline stalls
A memory copy (memcpy) is sequentially optimized by hardware and blazing fast. A lock causes complete chaos inside the CPU.
This is why modern architectures (like Rust's ownership model or Go's channels) prefer "sharing by communicating" over "communicating by sharing."
Stop fearing memcpy.
Start fearing .lock().
What’s your biggest "performance optimization" that backfired? Let's talk below.