Beating the CAP Theorem Checklist
Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment advocates a way to beat the CAP theorem. Your idea will not work. Here is why it won't work:
- ( ) you are assuming that software/network/hardware failures will not happen
- ( ) you pushed the actual problem to another layer of the system
- ( ) your solution is equivalent to an existing one that doesn't beat CAP
- ( ) you're actually building an AP system
- ( ) you're actually building a CP system
- ( ) you are not, in fact, designing a distributed system
Specifically, your plan fails to account for:
- ( ) latency is a thing that exists
- ( ) high latency is indistinguishable from splits or unavailability
- ( ) network topology changes over time
- ( ) there might be more than 1 partition at the same time
- ( ) split nodes can vanish forever
- ( ) a split node cannot be differentiated from a crashed one by its peers
- ( ) clients are also part of the distributed system
- ( ) stable storage may become corrupt
- ( ) network failures will actually happen
- ( ) hardware failures will actually happen
- ( ) operator errors will actually happen
- ( ) deleted items will come back after synchronization with other nodes
- ( ) clocks drift across multiple parts of the system, forward and backwards in time
- ( ) things can happen at the same time on different machines
- ( ) side effects cannot be rolled back the way transactions can
- ( ) failures can occur while in a critical part of your algorithm
- ( ) designing distributed systems is actually hard
- ( ) implementing them is harder still
And the following technical objections may apply:
- ( ) your solution requires a central authority that cannot be unavailable
- ( ) read-only mode is still unavailability for writes
- ( ) your quorum size cannot be changed over time
- ( ) your cluster size cannot be changed over time
- ( ) using 'infinite timeouts' is not an acceptable solution to lost messages
- ( ) your system accumulates data forever and assumes infinite storage
- ( ) re-synchronizing data will require more bandwidth than everything else put together
- ( ) acknowledging reception is not the same as confirming consumption of messages
- ( ) you don't even wait for messages to be written to disk
- ( ) you assume short periods of unavailability are insignificant
- ( ) you are basing yourself on a paper or theory that has not yet been proven
Furthermore, this is what I think about you:
- ( ) nice try, but blatantly false advertising
- ( ) you are badly reinventing existing concepts and should do some research
- ( ) in particular, you should read the definition of the word 'theorem'
- ( ) also you should read the definition of 'distributed system'
- ( ) you have no idea what you are doing
- ( ) do you even know what a logical clock is?
- ( ) you shouldn't be in charge of people's data