Hashing
- Hashmap: A Data Structure that is stores values indexable with hashed keys.
- Hash Function: the corresponding mapping mechanism yielding values for a hashed key.
1. Consistent Hashing
- technique used in distributed systems to distribute data across multiple nodes (servers) efficiently.
- Unlike traditional hashing, where a change in the number of nodes can disrupt the entire data distribution, consistent hashing minimizes the impact of adding or removing nodes.
- resources :
2. Working mechanism
- The Hash Ring: Imagine a circular ring (the "hash ring"). Both data items and nodes are assigned positions on this ring using a hash function.
- Data Placement: Each data item is mapped to the first node encountered clockwise on the ring from its hash position. This node is responsible for storing and serving that data item.
- Node Addition: When a new node is added, its position on the ring is determined by hashing its identifier. Only the data items that were previously mapped to the nearest clockwise node now get remapped to the new node.
- Node Removal: When a node is removed, the data items it was responsible for are remapped to the next available node in the clockwise direction.
3. Benefits
- Minimizes Remapping: When nodes are added or removed, only a small fraction of the data needs to be remapped, reducing disruption to the system.
- Balanced Load Distribution: Data is distributed evenly across nodes, preventing any single node from becoming a bottleneck.
- Scalability: Easily scales horizontally by adding more nodes to the ring.
4. Use Cases
- Distributed Caching: Used in systems like Memcached and Redis to distribute cache data across multiple servers.
- Load Balancing: Used to distribute requests across multiple web servers or application servers.
- Distributed Hash Tables (DHTs): Used in peer-to-peer systems to locate data stored across a network of nodes.
5. Key Considerations
- Hash Function Choice: Choosing a good hash function is crucial for ensuring even data distribution.
- Virtual Nodes: To further improve load balancing, each physical node can be represented by multiple "virtual nodes" on the hash ring.
- Data Replication: Consistent hashing can be combined with replication strategies to ensure data availability even if nodes fail.
Tags::cs:programming:data: