CAP
Table of Contents
1. Basics
Distributed Systems: Applies to systems that run on multiple computers connected over a network.
2. The Tradeoff
You can only guarantee two out of three properties:
- Consistency (C): Everyone sees the same data at the same time.
- also see Eventual Consistency
- Availability (A): The system is always up and running (responds to users t all times), even with some failures.
- Partition Tolerance (P): The system keeps working even if network problems split it into parts.
Network Problems are unavoidable, so partition tolerance is a must.
3. The Choice
Have to choose between consistency and availability when network problems happen:
- CP Systems: Prioritize consistency: will become unavailable partition issues.
- AP Systems: Prioritize availability: may show outdated data during partition issues.
4. Post Partition Reconciliation
4.0.1. Post-Partition Reconciliation
- This refers to the processes that occur after a network partition has been resolved, where the data may have diverged across different nodes.
- The goal is to restore consistency across the system while maintaining high availability.
4.0.2. Key Concepts in Post-Partition Reconciliation
- Eventual Consistency: This model ensures that, given enough time, all updates will propagate through the system and all replicas will become consistent.
- Conflict Resolution: Strategies to handle conflicting updates (e.g., last-write-wins, version vectors, or custom merge strategies).
- Data Reconciliation Techniques:
- Merging: Combining differing versions of the data.
- Gossip Protocols: Nodes regularly exchange state information, allowing eventual convergence.
- Quorum Reads/Writes: Ensuring that reads/writes are acknowledged by a subset of nodes to maintain consistency.
5. Latency and Consistency
- see PACELC Theorem