SYSTEM DESIGN PHASE 1 - DAY 1
Databases:
Today we are learning about SQL databases and some methods we can use to make our sql datbase more reliable and prone to crashes.
Before Diving into All those methods first we need to understand the RDBMS.
It stands for Relational Database Management System and so why it is called Relational Even ? we know basically what is DBMS but why there is Relational?
So Relational defines that one table has a relation with another table for example :
Student Table has course id - > courses table , the course id is referenced to courses table which defines which course is the student is enrolled in this creates a relation between both of the tables.
As you can see course id has reference to courses table which makes it relational to another table .Thats why we call it Relational Database Management System.
So now we know What it is lets move ahead towards ACID properties.
ACID means Atomicity, Consistency, Integrity and Durability.
Lets Define these in a single statement.
Atomicity - Each transaction is all or nothing
Consistency - Any transaction will bring the database from one valid state to another
Isolation - Executing transactions concurrently has the same results as if the transactions were executed serially
Durability - Once a transaction has been committed, it will remain so.
There are many techniques to scale a relational database: master-slave replication, master-master replication, federation, sharding, denormalization, and SQL tuning.
Master Slave Replication - so as we know a single database which resolves read and write operation cannot handle a million requests right? so we create master slave replication technique.
In this technique we create replicas of the primary database which sync with the primary database as writes operations are done on them.
As you can see on the diagram this will give you a better clearity of what i meant above. Read operations are directly hopped into replicas which only supports read operations and write operations are forwared into primary which is the MASTER in this case.
Problem ? having more replicas means they all have to sync up with the master database if any write operation happnes which increases latency and perfomance issues. and also requires complex code logic to divide the operations.
Master-Master-Replication- So instead of creating more replicas why dont we create two master dbs which stays in sync with each other and if one fails there is another master db which stays up as a backup.
Problems?
-You'll need a load balancer or you'll need to make changes to your application logic to determine where to write.
-Most master-master systems are either loosely consistent (violating ACID) or have increased write latency due to synchronization.
-Conflict resolution comes more into play as more write nodes are added and as latency increases
Disadvantages of whole replication system-
-There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.
-Writes are replayed to the read replicas. If there are a lot of writes, the read replicas can get bogged down with replaying writes and can't do as many reads.
-The more read slaves, the more you have to replicate, which leads to greater replication lag.
-On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas only support writing sequentially with a single thread.
-Replication adds more hardware and additional complexity.
Federation- Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
Disadvantage of federation -
-Federation is not effective if your schema requires huge functions or tables.
-You'll need to update your application logic to determine which database to read and write.
-Joining data from two databases is more complex with a server link.
-Federation adds more hardware and additional complexity.
Sharding -Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.
Disadvantages of sharding -
-You'll need to update your application logic to work with shards, which could result in complex SQL queries.
-Data distribution can become lopsided in a shard. For example, a set of power users on a shard could result in increased load to that shard compared to others.
-Rebalancing adds additional complexity. A sharding function based on consistent hashing can reduce the amount of transferred data.
-Joining data from multiple shards is more complex.
Sharding adds more hardware and additional complexity.
Denormalisation and sql tuning topics are super easy you can lookup on the internet , as the post become already so damn big you can look for these yourself that would be super easy to understand .
Thanks for reading :)
THE NEXT POST WILL BE ABOUT NOSQL DATABASE.