Filter
Exclude
Time range
-
Near
May 25
How can backend servers evolve from: “1 thread handles everything ” to threadpools, Hystrix, bulkheads, circuit breakers, async IO and event loops… Let’s build this story from scratch 🧵 First: what even is a PROCESS? A process is just a running instance of a program. A process contains: memory, files, sockets and threads. Think of it like an isolated execution container. Now inside a process comes THREADS. Thread = actual unit executing code. Without threads, your program literally does nothing. In the beginning, let our server handle requests using ONE THREAD. Request comes → thread processes → response sent. Easy. Until second user arrives. Now the problem starts. If one request becomes slow, entire server waits. Ex: DB query taking 5 sec. Meanwhile all other users are blocked too. One slow request freezes the whole server. "Fine, we'll just add more threads.” So THREAD PER REQUEST model is born. New request? Spawn new thread. Now requests run concurrently. Feels amazing initially. Traffic increases. 10 users. 100 users. 1000 users. The server is now creating thousands of threads. Then new problems appear. Threads are NOT free. Every thread needs: - stack memory - scheduling - CPU coordination - kernel management Creating and destroying threads repeatedly is expensive too. Too many threads = massive CONTEXT SWITCHING. CPU spends more time switching threads than doing work. Also most backend threads aren’t computing. They’re WAITING. Waiting for: - DB - APIs - disk - network Meaning: thousands of expensive threads sitting idle. We realized, “Creating threads per request is cursed.” So now THREADPOOLS are introduced. Instead of infinite threads, create a fixed size pool: for ex. 200 threads. Requests borrow thread → finish work → return it back. Much more efficient. System survives nicely now. Until… traffic spike slow downstream service. Imagine: thread pool size = 200 Normally request takes : 50ms. But DB suddenly slows: 5 seconds. Now every request thread becomes occupied much longer. Very quickly, all 200 threads get blocked. Now no free worker threads exist. New requests either wait in queue or get rejected instantly... Then: timeouts, 5xx errors, angry users, oncall engineer crying at 3 AM. This is THREADPOOL EXHAUSTION. Then we make things worse accidentally: RETRIES. Request failed? “Retry bro.” Now traffic multiplies during outage. One slow DB becomes full infrastructure apocalypse. Another realization : “Why should ONE bad service kill ENTIRE app?” Ex: Recommendation service is slow. Why should payments, auth and orders also die? Netflix engineers solved this using HYSTRIX. Main idea: SEPARATE THREADPOOLS PER DEPENDENCY/COMMAND. Example: payments → 20 threads recommendations → 10 threads Now failures stay isolated. This concept is called BULKHEAD PATTERN. Inspired from ships. If one compartment floods, entire ship shouldn’t sink. Same idea, contain failures locally. But even separate pools can fill up. So another idea came: CIRCUIT BREAKER. If service keeps failing, STOP calling it temporarily. Circuit breaker basically says: “bro dependency is cooked”. Instead of: - waiting forever - blocking threads - retry storms Requests fail FAST. After some time, the system allows a few test requests again. If dependency recovered, circuit closes. If not, it stays open. System survives. Then we realized another thing: “Why waste one thread waiting for network calls?” This led to: ASYNC IO EVENT LOOPS. Instead of 1 thread per connection, Event loop handles thousands of sockets efficiently. Core idea behind: - Node.js - Netty - Nginx - modern high-scale networking So backend evolution basically became: 1 thread → many threads → thread explosion → threadpools → threadpool exhaustion → isolated pools → circuit breakers → async IO → event loops Share and repost if you like this way of learning concepts 🫶 #systemdesign #threadpool #swe
5
31
6,338
CppAfrica will be hosting our very own @JennifaChukwu for this month's meetup with a talk entitled "Fix one bottleneck, create another" on May 19, 3:00 p.m UTC on our discord server. This will be an exiting Optimization story about the Journey with threadpools #cplusplus #cpp
1
2
3
134
And by popular demand, we’ve polished the developer experience: ⚡ Servers support hot reload during development ⚡ Decorators now return callable functions (finally!) ⚡ Automatic threadpools for sync tools
1
7
1,896
Techniques I’d master if my Rust distributed system had to survive production: 1. Backpressure everywhere If your system cannot say “no”, it will say “yes” and then die. Bounded channels, bounded queues, bounded concurrency. Always. 2. Timeouts as a contract Every network call needs a deadline. Not “eventually”. In Rust that means propagating timeout budgets through your layers. 3. Cancellation that actually cancels Do not rely on dropping a future and hoping it stops. Use cancellation tokens, select on shutdown signals, and make sure every task has an exit path. 4. Idempotency by default Retries will happen. Networks lie. Clients spam. If your write path is not idempotent, you are in for a big big surprise!! (and i am talking possible monetary loss) 5. Retry budgets, not infinite retries Retries amplify load and create a self DDOS. Use exponential backoff with jitter, cap retries, and use a global retry budget per request. 6. Load shedding over slow death When overloaded, fail fast and cheaply. Return 429/503 early, drop non critical work, degrade gracefully. The worst case is a system that slowly becomes unusable for everyone. 7. Circuit breakers and bulkheads One dependency going bad should not take the whole process. Separate threadpools, separate connection pools, separate queues per subsystem. 8. Connection pooling and reuse hygiene If you open too many connections, you DoS yourself and the other side. Pools need limits, timeouts, and health checks. 9. Serialize less, copy less Distributed systems are often “CPU bound by JSON”. Measure serialization time, switch formats if needed, avoid cloning large buffers, use bytes and zero copy where possible. 10. Observability that answers “why” Logs are not observability. You need traces for request flow, metrics for saturation (queue depth, p99 latency, error rate), and structured logs for incident debugging. 11. Protect p99, not average Average latency is a lie. In Rust you will fight tail latency from GC free world assumptions, allocator contention, lock contention, and slow IO. Track p50/p95/p99 always. 12. Stateful components need ownership boundaries If multiple tasks mutate shared state, you will end up with locks everywhere. Prefer ownership patterns: one task owns the state, others send messages. 13. Failure mode testing Chaos is not a buzzword. Kill nodes, delay packets, drop responses, corrupt data, restart mid write. If you have not tested it, it will happen in prod. 14. Versioning Schema versioning, API versioning, and rollout versioning. Old and new will coexist longer than you think. Design for mixed clusters. 15. Make correctness cheap Invariant checks, asserts in debug, property tests, and small model simulations of your protocol. Rust helps but Rust does not save you from bad distributed logic. If you can do these, the hard stuff like consensus, replication, sharding becomes learnable. Most outages are not from algorithms, they are from missing budgets, missing backpressure, and missing ownership boundaries.
24 Dec 2025
Techniques I’d master if my LLM had to survive production: Bookmark this. 1. Cost per 1K tokens tracking 2. Quantization quality regression 3. Fallback model routing 4. Adaptive model selection 5. Canary inference 6. Timeout-aware decoding 7. Truncation-induced hallucinations 8. Retry amplification issues 9. Token budget enforcement 10. Eval-driven inference tuning 11. Confidence-based stopping 12. Partial-response recovery 13. Observability for inference 14. Silent failure detection
2
7
41
2,845
Design discussions in a company aren’t just senior folks talking in circles. They’re how teams prevent expensive mistakes before they get shipped. Tbh it's pretty standard (we try to adhere to the best practices). But the real discussion that we go through is around business problems and how our system should be designed (again - following the standard practices) around those business requirements. --- Early in my career I thought engineering = writing code tickets. Then I sat in real design reviews and realized: most of the job is deciding what not to build, what to simplify, and what to make failure-proof. Here are two real design discussions (the kind you’ll see in good teams, more or less). One is related to Kafka consumers horizontal scalability. The other is a classic hot path related system design. --- Design Discussion #1: Kafka consumers, consumer groups, and horizontal scalability The situation A producer is dumping events fast. The consumer is falling behind. People say "just add more consumers" Sounds easy… until you learn how Kafka actually scales. The questions that show up in a design review 1) What exactly are we scaling: throughput or isolation? If you need more throughput for the same work, you scale within one consumer group. If you need multiple independent “downstream jobs” (search indexer, analytics, fraud, email), you scale via separate consumer groups. That one distinction changes everything. 2) Do we have enough partitions? Kafka parallelism inside a consumer group is limited by partitions: Max active consumers in a group ~ number of partitions 50 consumers on a topic with 10 partitions = 40 consumers idle So the design discussion now becomes: - How many partitions now? - How many partitions in 6 months? - How expensive is over-partitioning (broker load, open files, metadata)? 3) What ordering guarantees do we actually need? Partitions give ordering per key. So we ask: -Do we need “per-user ordering”? - “per-order ordering”? - Or do we not care? Because this decides your partition key. And the partition key decides: - load distribution (hot keys) - ordering correctness - failure recovery behavior 4) What happens when processing is slow? This is where “real” engineering starts: - Is the consumer doing DB writes one-by-one? - Is it doing external API calls? - Is it CPU heavy (parsing, compression, ML inference)? Common fixes that were discussed: - Batch writes (100–1000 records) instead of single inserts - Use async pipelines with bounded concurrency - Add local buffering but with hard caps (so you don’t OOM) - Apply backpressure: stop polling when your internal queue is full 5) What’s our failure model: at-least-once, at-most-once, exactly-once-ish? Kafka gives at-least-once by default. So design review asks: - Are our writes idempotent? - Do we dedupe by idempotency key? - Can we tolerate duplicates? This is all about scaling consumers. --- Design Discussion #2: “We need faster APIs” (and it turns into a whole architecture review) The situation Latency is spiking. P95 is bad. Users complain. The first instinct is “optimize code”. The design discussion usually starts with: - Where is time going? Network? DB? serialization - downstream services? - What is the SLO? P95 < 200ms? P99 < 500ms? - Is the load steady or bursty? The tradeoffs that get debated 1) Caching: what layer and what shape? - Redis cache for read-heavy hot keys - in-process cache for ultra-hot tiny objects (can lead to many problems if it's pod specific cache) But then the real debate: - What’s the TTL? - How do we invalidate? - Can we tolerate stale reads? - Are we caching the right thing (response vs computed intermediate)? 2) DB load: optimize queries or change the data model? Teams look at: - missing indexes - N 1 query patterns - large JSON blobs in hot tables - “select *” on large rows Then the bigger move: - introduce a read replica - split hot/cold tables (a good solution in many cases) - materialized views for read patterns - precomputed aggregates 3) Async vs sync If a request does 6 downstream calls, design review asks: - which calls are mandatory to return a response? - which can be async (outbox queue)? - can we return partial results? This is where you see patterns like: - outbox worker - queues for slow tasks - “eventually consistent UI” - sagas for multi-step workflows 4) Blast radius A staff-level question: If this dependency dies, do we take down the whole API? - Do we have timeouts? - circuit breakers? - fallbacks? Design discussions often end with: - strict timeouts everywhere - bulkheads (separate threadpools / queues) - graceful degradation - error budgets and SLO-based alerting --- What to do if you’ve never been part of these discussions? Here’s how people break into them without being senior: Start bringing a one-page doc: Goal → current pain → options → tradeoffs → recommended choice → risks Ask the 3 questions that instantly level you up: 1 “What breaks first at 10x traffic?” 2 “What’s our failure mode here?” 3 “What would wake up on-call at 3am?” After the discussion, write the summary and circulate it. That alone makes you valuable because most teams lose decisions to memory. --- Thanks for reading, i share such real world insights, consider following if you find this interesting!
20 Dec 2025
Replying to @0xlelouch_
Can you write a post about design discussion in a company ? Ik it's not always about code directly, I was never part of any discussion like that
1
8
36
2,700
Replying to @davepl1968
Nothing to worry: - Freethreaded is a separate alternative interpreter (it's a compilation flag), you have to install and run explicitly. - Most C libraries already release the GIL when called, and those using threads use them in C, isolated from python. - From the python side, you'll only have to care about thread issues if you actually spawn threads (or rely on threadpools, which includes asyncio). - Most current code actually doing parelallism (other than just IO) is relying on multiprocessing (process spawning) instead of threads to avoid GIL already (a separate process means a separate interpreter, with the obvious overhead of IPC).
1
24
2,343
Replying to @asmah2107
So the main motive should be to contain the blast radius. To have as little damage as possible. There are a few ways I think in which we can achieve this. Implement timeouts so we don’t wait forever for a service to respond. Follow circuit breaker pattern so whenever something is failing don’t call that. Break the circuit. Also make sure the threadpool used for calling these services doesn’t hoard up all the resources which then slows up your service. Add a cap or limit to the number of resources these threadpools can take up. This is called Bulkhead. Also enable load shedding where you reject extra incoming requests and not letting them take up the threads or resources.
4
43
6,994
24 Jul 2025

4
47
2,713
26 May 2025
Replying to @alessandrod
Not building anything that is NUMA aware yet, just re-reading this people.freebsd.org/~lstewart… This made me wonder, when would I need to worry about using libnuma, or being NUMA aware in the first place? The first things that come to mind are threadpools, allocators maybe?

2
3
378
Relevant to ThreadPools design above, I also came across @arpit_bhayani's YouTube video "Why thread pools even exist? and how to implement them?" - highly recommended. He has covered the cost of thread creation and destruction at will and how the thread pools can optimize this within the context of web server implementation in Go. youtu.be/NgYS6mIUYmA
9
753
17 Mar 2025
Replying to @mctweetsthis
I don't see a contradiction. You can cook something in python really quickly, and if you still want it to be somewhat fast, you just use this simple rule. I love using threadpools in python because they are so easy to use and you get instant speed up for a lot of stuff.
2
2
57
24 Feb 2025
Replying to @RemiCadene
haven’t scaled anything with video, but for anyone doing big img training, I’ve arrived at a pattern that works really well for 10TB datasets that have to be stored on spinning drives. keep labels in HDF5 on nvme so you can do fast vectorized operations, generate batch orders ahead of time, proactive prefetch w/ separate I/o preprocess (aug) threadpools, store preprocessed batches in memcache. easily generalizes to MPI with Lustre-type filesystem, put all the images in a giant images.h5, shard to get ~100gb per file handler. works really well and at least for my GPU-poor setup I am never waiting on data, even with all 100m stored on HDDs (albeit 10disk raidz3, so tremendous throughput, but random ops still bottleneck if you don’t proactively prefetch batches)
3
3
236
1 Feb 2025
Daily Progress: Halfway completed reading Deepseek onto writing a blog on it. It will be first blog so I'll try to make it as readable as possible. Java: Built a singlethreaded server in java. Onto learning threadpools and start building a multithreaded server.Please like/repost
31 Jan 2025
Daily progress: started reading deepseek, completed ticket booking app in java onto building multithreaded server in java
2
127
Reduce tail latencies with neighbour-friendly threadpools (Arxiv)
1
2
182
26 Dec 2024
> In both cases, it seems to simply come down to _some_ software process or hardware module needing to prevent the same coin from being spent twice. And that process -- whatever it is -- is constrained by the same physics. Why wouldn't it be? It’s not that the check isn’t required, it’s that making download and validation dependent upon it imposes linearity by block on them as well! This is a horribly inefficient approach. These three phases are decoupled in libbitcoin. They operate on independent threadpools. Confirmation checks are executed as soon as feasible, but block nothing else. Validation as well. Download as well. Order is the death of performance. It’s a giant critical section blocking everything else.
1
2
97
29 Nov 2024
Replying to @HSVSphere
C# comes out surprisingly well, also given that AFAIK it's "async" implementation is mostly threadpools.
3
223
Replying to @OneHung
I’m 8.0 - and the speed improvements are quite noticeable. I use ThreadPools to the maximum and all code is asynchronous
1
26
5 Sep 2024
Replying to @AgileJebrim
Single node is nice but the reality is most big tech places have multiple downstream servers, then you start coming up with creative solutions like threadpools and containers/sharing cpu time
1
9
12,042