Yoo, Finally completing it
Most modern sharded database (like Apache ShardedSphere, PgDog) rely on distributed consensus to coordinate cross shard transaction but standard 2 phase commit has high latency under wide area replication ...
In standard 2PC we have : Preparing ---> Prepared ---> committing ---> committed
So I built a transaction coordinator which has a speculative mode ... we add three new shard level states : SPEC_EXECUTING ---> SPEC_COMMITTED --> SPEC_ROLLED_BACK (in simple words a control panel in between of sql aware router and different mysql and postgres shards)
the coordinator tracks a speculation log in etcd same CAS protected state transition as the transition log ... this is done so that two coordinator don't mug up each other .... The undo log in MySQL captures the pre image delta before speculative write and the rollback is stored in procedure soo if a shard crashes mid-speculation, it reads the undo log on recovery and restores state.
This project is my major step to help see why speculative consensus can cut distributed transaction latency
Tech stack : [[ golang ]] for the coordinator engine, [[mysql-8.0-instance]] shards (a, b, c), [[ etcd ]] for service discovery, [[ sysbench ]] so its obvious I cannot have 20k to 30k as a scale on my own at this stage I need to first simulate it ... so sysbench helps me with it... [[ prometheus & grafana ]] for latency histogram .. [[ innoDB ]] for row level locking ... [[ toxiproxy ]] for packet loss injection