Eventual Consistency
1. Gossip Protocols
- also known as epidemic protocols,
- communication mechanisms used in distributed systems where nodes share information with each other in a decentralized manner.
- mimic the way gossip spreads in social networks, where individuals share news with their friends, who then share it with their friends, and so on.
1.1. Working Mechanism
- Random Peer Selection: Each node periodically selects a random subset of its peers (other nodes it's connected to) and initiates communication.
- Information Exchange: The nodes exchange information about their state, including data, updates, or events they've observed.
- Propagation: The received information is then shared with other randomly selected peers, gradually disseminating throughout the network.
1.2. Key Features
- Decentralized: No central coordinator or leader controls the communication.
- Scalable: Works well in large-scale systems with thousands of nodes.
- Robust: Tolerant to node failures and network partitions.
- Eventual Consistency: Information eventually reaches all nodes, but there's no guarantee on how long it will take.
1.3. Types of Gossip Protocols
- Push Gossip: A node actively pushes its information to randomly selected peers.
- Pull Gossip: A node requests information from randomly selected peers.
- Push-Pull Gossip: A combination of both, where nodes both push and pull information.
1.4. Use Cases
- Failure Detection: Nodes can gossip about their health status, allowing the system to detect failures quickly.
- Data Dissemination: Used to spread data updates or events across the network.
- Peer Sampling: Nodes can discover other nodes in the network by gossiping about their neighbors.
- Aggregate Computation: Nodes can compute aggregates (e.g., average, sum) by gossiping partial results.
1.5. Advantages
- Scalability: Handles large networks with thousands of nodes efficiently.
- Fault Tolerance: Can withstand node failures and network partitions.
- Simplicity: Relatively simple to implement and understand.
- Low Overhead: Doesn't require a central coordinator, reducing communication overhead.
1.6. Disadvantages
- Eventual Consistency: Not suitable for applications requiring strong consistency.
- Latency: Can take some time for information to propagate to all nodes.
- Redundant Messages: Can result in redundant messages being sent due to the random nature of peer selection.
2. Hinted Handoff
- helps ensure that data updates eventually reach all replicas, even when some nodes are temporarily unavailable.
- is a key mechanism in Cassandra that helps to bridge the gap between availability and eventual consistency
- by temporarily storing data updates for unavailable replicas, it ensures that writes are not lost and that all replicas eventually converge to the same state.
- This makes Cassandra a robust and reliable choice for applications that prioritize availability and can tolerate eventual consistency.
2.1. How Hinted Handoff Works
- Write Request: When a write request is sent to a Cassandra node (the coordinator), it forwards the request to the replicas responsible for storing that data.
- Unavailable Replica: If one or more replicas are unavailable (e.g., due to network issues or maintenance), the coordinator cannot immediately write the data to them.
- Hint Creation: Instead of failing the write, the coordinator stores a "hint" locally. This hint contains the data that needs to be written and the address of the unavailable replica.
- Handoff: When the unavailable replica comes back online, it contacts the coordinator and requests any hints that were stored for it.
- Hint Replay: The coordinator sends the stored hints to the replica, which then applies the missed writes, eventually catching up with the rest of the cluster.
2.2. Benefits
- Increased Write Availability: Even if some replicas are down, writes can still succeed as long as a quorum of replicas is available.
- Eventual Consistency: Hinted handoff ensures that all replicas eventually receive the updates, maintaining data consistency over time.
- Reduced Client Retries: Clients don't need to constantly retry failed writes since the hints will be replayed automatically.
2.3. Key Considerations
- Hint Lifetime: Hints are not stored indefinitely. They have a configurable lifetime, after which they are discarded if the replica remains unavailable.
- Hint Storage: Hints are typically stored on disk, which can impact disk usage if a node is down for an extended period.
- Handoff Overhead: Replaying hints can add some overhead to the system, but this is usually a minor cost compared to the benefits of improved availability and consistency.
Tags::cs: