Distributed Hash Table (DHT)
A Distributed Hash Table (DHT), is a decentralized data structure that provides a way to store and retrieve key-value pairs in a distributed network. Data in the DHT can be accessed by nodes in the network without having to rely on a central authority, thus they are commonly used in peer-to-peer networks.
Features:
- Decentralization: No single point of failure
- Scalability: Can handle large number of nodes and data entries
- Fault Tolerance: System continues to function when some nodes fail
- Efficiency: Can find entries in logarithmic time relative to number of nodes
Implementation
Structure:
- Each node in the network is assigned an ID using a hash function
- This unique identifier is used to determine the node's position in the DHT
- DHT uses key-value pairs. The key is a hash used to find the value in the DHT
- Routing Table: Each node maintains a routing table about the network.
- Helps nodes efficiently locate other nodes nd the data they store.
- Includes a list of known nodes and their IDs
- Includes info about the distance to these nodes
Data storage:
- A node wants to store a key-value pair
- It hashes the key to find the node responsible for it
- The data is then sent to that node for storage.
Lookup Process:
- A node wants to retrieve a value from a key
- It hashes the key to find the responsible node
- Routing table is used to navigate the network
- Queries nodes along the way until the target node is reached
Nodes Joining:
- A new node joins the network
- It finds existing nodes and updates its routing table
- Existing nodes are found via a bootstrapping mechanism
- It then claims a portion of the keyspace (range of keys it will manage)
Bootstrapping mechanisms:
- Hardcoded Bootstrap Nodes, i.e. nodes that are known to be always online
- DNS Seeders, i.e. domain names that resolve to IPs of known nodes
- Peer Discovery Protocols, i.e. allowing nodes to advertise their presence
- Previous Connections: memory of previous connection can be used to find a node
- Public Nodes: Nodes that are advertised on forums, websites, etc.
Nodes Leaving:
- When a node leaves, its data must be redistributed
- It transfers data to its successor or other nodes
- If the node leaves without doing this, it's a failed node
Replication and Fault Tolerance:
- Data replication is often implemented
- Copies of data are stored on multiple nodes.
Consistency and Maintenance:
- Mechanisms needed to maintain consistency and handle changes in the network
- Can involve periodic updates to routing tables and data rebalancing
Examples and Use Cases
DHTs are used in
- peer-to-peer file sharing
- decentralized storage systems
- blockchain networks
Popular DHT implementations include
- Kademlia
- Chord
- Pastry