We often think of a database as a single giant file on a hard drive that gets updated whenever we save data. But for write-heavy powerhouses like Cassandra, RocksDB, or LevelDB, that approach is too slow.
Enter the LSM Tree (Log-Structured Merge Tree).
The secret sauce of an LSM Tree is that it doesn’t force a choice between Memory and Disk - it uses both in a relay race to maximize speed.
Here is the lifecycle of a write in an LSM-based database:
1. The Landing Zone: Memory (MemTable) 🧠
When you write data, it doesn't touch the hard drive immediately (except for a sequential append to a commit log for recovery). Instead, it goes straight into RAM.
This structure is called a MemTable.
It is mutable (changeable).
It organizes data in a sorted structure (often a Red-Black Tree or Skip List).
Why? Writing to RAM is lightning fast.
2. The Freeze: Flushing to Disk (SSTable) ❄️
Eventually, the MemTable fills up. When it hits a certain threshold (e.g., 256MB), the database says, "Okay, let's save this."
The entire MemTable is flushed to the disk as a new file called an SSTable (Sorted String Table).
Crucial Detail: SSTables are Immutable. Once written, they can never be modified.
If you update a record, you aren't overwriting the old one. You are just writing a new entry with a newer timestamp in a new SSTable.
3. The Cleanup: Compaction 🧹
Over time, you end up with hundreds of SSTables on your disk. This makes reading slow (you have to check too many files).
To fix this, a background process called Compaction kicks in. It takes several smaller SSTables, merges them, discards old/deleted data, and writes out a new, larger sorted file.
💡 Why this architecture wins
The genius of the LSM tree is Sequential I/O.
Traditional databases (B-Trees) often require "Random I/O" (jumping around the disk to update specific pages). LSM Trees turn random write requests into sequential batch writes. It treats the disk like a tape recorder, always appending, rarely seeking.
Summary:
- Write to RAM (Fast).
- Flush sorted chunks to Disk (Immutable).
- Merge chunks later to save space.
Have you worked with LSM-based stores before? Do you prefer them over B-Tree based systems like Postgres/MySQL for your specific use cases? Let’s discuss in the comments! 👇
#SystemDesign #Database #Cassandra #Engineering #LSMTree #TechTips