andersch.dev

<2024-10-26 Sat>

Distributed Hash Table (DHT)

A Distributed Hash Table (DHT), is a decentralized data structure that provides a way to store and retrieve key-value pairs in a distributed network. Data in the DHT can be accessed by nodes in the network without having to rely on a central authority, thus they are commonly used in peer-to-peer networks.

Features:

Implementation

Structure:

  • Each node in the network is assigned an ID using a hash function
  • This unique identifier is used to determine the node's position in the DHT
  • DHT uses key-value pairs. The key is a hash used to find the value in the DHT
  • Routing Table: Each node maintains a routing table about the network.
    • Helps nodes efficiently locate other nodes nd the data they store.
    • Includes a list of known nodes and their IDs
    • Includes info about the distance to these nodes

Data storage:

  • A node wants to store a key-value pair
  • It hashes the key to find the node responsible for it
  • The data is then sent to that node for storage.

Lookup Process:

  • A node wants to retrieve a value from a key
  • It hashes the key to find the responsible node
  • Routing table is used to navigate the network
  • Queries nodes along the way until the target node is reached

Nodes Joining:

  • A new node joins the network
  • It finds existing nodes and updates its routing table
  • Existing nodes are found via a bootstrapping mechanism
  • It then claims a portion of the keyspace (range of keys it will manage)

Bootstrapping mechanisms:

  • Hardcoded Bootstrap Nodes, i.e. nodes that are known to be always online
  • DNS Seeders, i.e. domain names that resolve to IPs of known nodes
  • Peer Discovery Protocols, i.e. allowing nodes to advertise their presence
  • Previous Connections: memory of previous connection can be used to find a node
  • Public Nodes: Nodes that are advertised on forums, websites, etc.

Nodes Leaving:

  • When a node leaves, its data must be redistributed
  • It transfers data to its successor or other nodes
  • If the node leaves without doing this, it's a failed node

Replication and Fault Tolerance:

  • Data replication is often implemented
  • Copies of data are stored on multiple nodes.

Consistency and Maintenance:

  • Mechanisms needed to maintain consistency and handle changes in the network
  • Can involve periodic updates to routing tables and data rebalancing

Examples and Use Cases

DHTs are used in

  • peer-to-peer file sharing
  • decentralized storage systems
  • blockchain networks

Popular DHT implementations include

  • Kademlia
  • Chord
  • Pastry