✍️ Wrote a new database post!
This one is about the different ways that distributed databases handle secondary indexes.
@FranckPachot got me thinking about this topic a few weeks back after I wrote a piece on DynamoDB secondary indexes for
@RocksetCloud.
Basically, distributed databases want to shard your data onto different machines. They use a shard key / function to determine which machine holds a given record.
But what happens when you have secondary indexes and queries that don't use your shard key?
Basically use the following flowchart:
First, does the database reshard items into new shards for secondary indexes?
- If yes, is resharding done synchronously during writes?
- If yes, then you have Yugabyte, Spanner, TiDB, etc.
- If no, then you have DynamoDB GSIs or Rockset
- If it's not resharded, are queries allow to span multiple shards?
- If yes, then you have MongoDB, Vitess, Cassandra, Elasticsearch
- If no, then you have DynamoDB LSIs
Walked through the benefits and drawbacks of each as well.
Let me know where I'm wrong!