🧵 Day 13/30 —
#SystemDesign
Database Replication: How systems stay fast and available even when one DB fails
Many apps run smoothly with one database… until traffic grows or the database crashes. Suddenly reads become slow, maintenance becomes risky, and one machine becomes a single point of failure.
That’s why production systems use Database Replication.
Replication means copying data from one database server (Primary) to one or more secondary servers (Replicas). The primary usually handles writes, while replicas help with reads and failover.
Flow:
App Writes → Primary DB
Primary Syncs Data → Replicas
Read Requests → Replicas
This improves performance and reliability.
-----------------
Why Replication Matters
Without replication:
→ One DB handles everything
→ Read traffic overloads server
→ Downtime risk if DB fails
→ Hard maintenance windows
→ Backups affect performance
With replication:
→ Scale read traffic
→ Better availability
→ Disaster recovery options
→ Safer maintenance
→ Lower load on primary
It is one of the first steps in database scaling.
-------------------
Primary vs Replica
Primary Database
→ Accepts INSERT / UPDATE / DELETE
→ Source of truth
→ Sends changes to replicas
Replica Database
→ Copies primary data
→ Usually serves read queries
→ Can be promoted during failure
Many systems use 1 primary multiple replicas.
-------------------
Real Example
E-commerce platform:
→ Order placement writes to Primary
→ Product browsing reads from Replicas
→ Search suggestions may read from Replicas
→ Reports can run on Replicas
This keeps critical writes fast while distributing reads.
-------------------
Replication Types
1. Synchronous Replication
Primary waits until replica confirms write.
Pros:
→ Strong consistency
Cons:
→ Slower writes
2. Asynchronous Replication
Primary confirms write immediately, replica updates later.
Pros:
→ Faster writes
Cons:
→ Small lag possible
Most large systems balance speed vs consistency carefully.
---------------------
Challenges Most Ignore
Replication helps a lot, but adds tradeoffs:
→ Replica lag (stale reads)
→ Failover complexity
→ Split brain risks
→ Write bottleneck still on primary
→ Monitoring needed
→ Backups still important
Replication improves systems, but doesn’t remove architecture thinking.