The future of Kafka is unlimited storage.
Kafka engineers are building a solution to allow you to store unlimited amounts of data on a Kafka cluster.
How are they doing it?
KIP-405 - Tiered Storage. โจ
If youโre using Kafka to its full extent, youโre storing (or want to store) a lot of historical data in it.
But. Storage can be a bottleneck. A key limitation in Kafkaโs design is that it couples its storage with compute.
This can be a large burden for 8 reasons:
๐ด 1. slow broker failure recovery - has to catch up a lot of data, proportional to the time it was dead.
๐ด 2. disastrous broker disk failure scenarios - in such cases, the broker has to fetch everything (TBs) and can severely impact latencies. Such broker recovery scenarios can be 12,000% slower and can worsen produce latencies by up to 900%, as shown by tests below.
๐ด 3. competition for disk IOs - historical consumers can decrease write performance by up to 43% (as shown in tests below) because they cause extra disk strain by having to read from it.
๐ด 4. no elasticity - reassigning partitions that have a lot of data are slow to move. At a decent 100MB/s replication rate, 10TB of data moves at a whopping 27.7 hours! Thatโs more than one day!
๐ด 5. high cost - in the cloud, itโs more expensive to provision disk volumes that are attached to the instance.
๐ด 6. max storage limitation per partition - youโre limited by how much data you can store on a single partition based on the limit of physical disk sizes.
๐ด 7. burdensome to scale storage - if you decide to increase your retention settings across the cluster, you either need to add new brokers (scale horizontally) or do some complex disk swaps on them (scale vertically).
๐ด 8. cluster sizing - number of brokers / machine types are impacted by disk requirements, potentially ending up with a significantly larger cluster than you would need if disk size wasnโt a concern.
How do we solve all of this? ๐ฅ
Simple. Put the data in S3 โจ
That is what Tiered Storage is - it extends Kafkaโs storage beyond the local one by retaining the data in a pluggable external store (HDFS, S3, etc).
Pluggable is a key word here, as it will enable the open-source community to develop different implementations for different external stores in parallel.
Kafka will end up having two tiers of storage placement - a local one (hot) and a remote one (cold).
You will be able to enable this uniquely per topic, with varying local and remote retention settings.
This will be done transparently to any clients - they won't be able to tell when theyโre fetching from the remote store as the Kafka API remains the same and abstracts it away.
๐ก Wonโt this kill latency?
One should in theory expect slower reads from the remote log store.
Thankfully,
1. This isnโt a problem practically as historical workloads are usually not performance sensitive.
2. The latency-sensitive workloads usually read from the tail of the log (latest data), and are therefore not impacted by this feature.
๐ค Give me the numbers.
Performance tests were done with HDFS as the external system.
They focused on write latency and the impacts there.
The largest produce latency increase in the tests was 21ms โ 25ms of p99 produce latency in the steady state. (19% worse)
With different scenarios came different results.
Get this - when there are historical reads (out-of-sync consumers), the produce latency was actually improved!
This is because, without tiered storage, consumers reading old data compete for IOs on the disk for reading (normal consumers donโt - data is served from the page cache).
This reduces the IOs that writes can get and write disk latency increases.
The tests showed 42ms of p99 produce with tiered and 60ms of p99 produce WITHOUT tiered storage in this scenario. (43% increase) ๐
It gets better!
When rebuilding a broker with an empty disk, for just 12TB of data, recovery took almost 4 hours in their test without tiered storage, and only 2 minutes with tiered storage (a 120x improvement).
During this broker recovery, the p99 produce latency was 490ms WITHOUT tiered and 56ms with tiered - a 9x improvement! ๐ฅณ
Overall a very promising feature. ๐ฅ
So when is it coming?
Tiered storage is incredibly complex, and has been in development for a while. It juuust missed the 3.5 release and is currently slated for Early Access in 3.6.
So around Q3 2023? ๐ค
There are two notable limitations that Kafka will have with this first version:
๐ด - no compaction - compacted topics will not be supported
๐ด - no going back - once you enable it for a topic, disabling it back is not supported.
Future versions will surely address these limitations.