I once had a VP at VMware tell me that “real” customers don’t actually care about multi-cluster, cross cloud failovers due to costs and that all the work we were doing in ClusterAPI for cross cloud compatibility should be re-prioritized. This was right before the incident where AWS us-east-1 went down in 2021 that basically crippled the entire internet for a day.
Didn’t hear much about it after that.
So, yeah, never assume a cloud region is the unit of stability / scalability.
The postmortem from Coinbase's 10-hour outage is out and... damn
They run global trading from a single region because of latency. OK, I understand.
BUT they have no automated failover prepared!
Are they praying the region never goes down?? Doesn't compute for me...