Node Types and Responsibilities
The Gnutella2 network is an ad-hoc, self-organising collection of interconnected nodes cooperating to enable productive distributed activities.
Not all of the nodes participating in the system are equal: there are two primary node types, "hubs" and "leaves". The goal is to maximise the number of leaves and minimise the number of hubs, however, due to the limited nature of resources the maximum viable ratio of leaves to hubs is limited. This quantity is known as the "leaf density".
Leaf nodes are the most common node type on the network - they have no special responsibilities and do not form a working part of the network infrastructure. Nodes with limited resources must operate as leaf nodes: this includes limited bandwidth, CPU or RAM, low or unpredictable expected uptime, and inability to accept inbound TCP or UDP.
Hub nodes on the other hand form an important and active part of the network infrastructure, organising surrounding nodes, filtering and directing traffic over several media types. Hub nodes devote substantial resources to the network, and as a result their capacity to participate in higher level network functions is limited. Only the most capable nodes are selected to act as hubs, based upon the criteria in the following section.
Hub Selection Criteria
Hubs are selected based on the following internal criteria:
- Suitable operating system (able to support > 100 sockets)
- Suitably high CPU and RAM available
- Long uptime (many hours, at least two), possibly considering historical uptime
- Adequate bandwidth, primarily inbound bandwidth
- Ability to accept inbound TCP and UDP
In addition to these internal factors, hubs must also consider the network's need for additional hubs. Without a central point of authority the need for additional hubs cannot be determined with absolute certainty; however it can be approximated by examining the state of nearby nodes and specifically the state of hubs in the local hub cluster.
Nodes operating as Gnutella2 hubs have a set of responsibilities to meet. Hubs are highly interconnected, forming a "hub network" or "hub web", with each hub maintaining a connection to 5-30 other "neighbouring" hubs. The number of hub interconnections must scale up with the overall size of the network.
Each hub also accepts connections from a large collection of leaf nodes, typically 200-300 depending on available resources. Leaf nodes are considered to be the "edge" of the network. In practice leaves simultaneously connect to two hubs, however from the point of view of the hubs each leaf is considered a dead end.
The group of hubs within the hub network spanning the local hub and its neighbours is termed the "hub cluster", and is an important grouping. Hub clusters maintain constant communication with each other, sharing information about network load and statistics, exchanging cache entries and filtering tables. The hub cluster is also the smallest searchable unit of the network as far as a search client is concerned.
Hub responsibilities include:
- Maintaining up to date information about other hubs in the cluster, and their neighbouring hubs, providing updates to neighbours
- Maintaining a node routing table mapping node GUIDs to shortest route local TCP connections and UDP endpoints
- Maintaining query hash tables for each connection, including both leaves and hubs so that queries can be executed intelligently
- Maintaining a superset query hash table including local content and every connected leaf's (not hub's) supplied tables, to supply to neighbouring hubs
- Monitoring the status of local connections and deciding whether to downgrade to leaf mode, and keeping distributed discovery services such as GWebCaches updated