Gnutella2 - User contributions [en]

Query Hash Tables

2005-03-28T11:55:55Z

Kath: /* Table Exchange */

== Introduction ==

Building an efficient network topology is not viable without means to restrict the flow
of information appropriately. In the Gnutella2 network architecture, neighbouring
nodes exchange compressed hash filter tables describing the content, which can be
reached by communicating with them. These tables are updated dynamically as
necessary.

The concept of filtering hash tables in file sharing was pioneered by Limegroup for
the LimeWire? Gnutella1 application.

== Table Properties ==

Query hash tables or QHTs provide enough information to know with certainty that a
particular node (and possibly its descendants) will not be able to provide any
matching objects for a given query. Conversely, the QHT may reveal that a node or
its descendants may be able to provide matching objects.

This property means that queries can be discarded confidently when a transmission
is known to be unnecessary, while not providing the filtering or forwarding node, any
actual information about the searchable content. Neighbours know what their
neighbours do not have, but cannot say for sure what they do have. QHTs are also
very efficient, both in terms of exchange and maintenance and lookup cost.

== Table Content ==

A QHT is a table of 2N bits, where each bit represents a unique word-hash value.
For example a table of 20 bits has 220 = 1048576 possible word-hash values. If
stored uncompressed this table would be 128 KB in size.

In an empty table, every word-hash value bit will be "1", which represents "empty".
To populate the table with searchable content, an application must:

* Locate every plain-text word and URN which could be searched for and produce a match/hit
* Hash the word with a simple hash function to produce a word-hash value which is 0 <= value < 2N.
* Set the appropriate bit in the table to zero, representing "full".

== Word Hashing ==

Words are strings of one or more alphanumeric characters which are not all numeric.

To convert a word into a hash value, the following case-insensitive algorithm is used:

<pre>
<nowiki>
// HashWord( string_ptr, char_count, table_bit_count );
DWORD CQueryHashTable::HashWord(LPCTSTR psz, int nLength, int nBits)
{
DWORD nNumber = 0;
int nByte = 0;
for ( ; nLength > 0 ; nLength--, psz++ )
{
int nValue = tolower( *psz ) & 0xFF;
nValue = nValue << ( nByte * 8 );
nByte = ( nByte + 1 ) & 3;
nNumber = nNumber ^ nValue;
}
return HashNumber( nNumber, nBits );
}

DWORD CQueryHashTable::HashNumber(DWORD nNumber, int nBits)
{
WORD64 nProduct = (WORD64)nNumber * (WORD64)0x4F1BBCDC;
WORD64 nHash = ( nProduct << 32 ) >> ( 32 + ( 32 - nBits ) );
return (DWORD)nHash;
}
</nowiki>
</pre>

== URNs - A Special Case ==

URNs are treated as a special case: rather than dividing them up into word tokens,
they are hashed as a complete fixed length string. For example:

<pre>
<nowiki>
urn:sha1:WIXYJFVJMIWNMUWPRPBGUTODIV52RMJA
</nowiki>
</pre>

Bitprint URNs are actually composite values which include both a SHA1 and
TigerTree root value. Rather than adding the whole bitprint to the table, each of the
constituent URNs are added separately. This allows SHA1-only querying and TigerTree-only
querying. A root TigerTree URN takes the form:

<pre>
<nowiki>
urn:tree:tiger/:CN25MLNU3XNN7IHKZMNOA63XG6SKDJ2W7Z3HONA
</nowiki>
</pre>

Other URNs should be expressed in their most natural form before being fed to the
word hash function.

== Word Prefix Extensions ==

For words consisting of at least five characters, it is often useful to be able to match
substrings within the word. Unfortunately, adding every possible substring of each
word would increase the density of the QHT, however, a simple and effective
compromise is available:

For words with 5 or more characters:

* Hash and add the whole word
* Hash and add the whole word minus the last character
* Hash and add the whole word minus the last two characters

This allows searching on prefixes of the word, for example "match" will now match
"matches".

== Table Exchange ==

Nodes must keep their neighbouring hubs up-to-date with their latest local query
hash table at all times. Rather than sending the whole table whenever it changes,
nodes may opt to send a "table patch", which includes only the difference between
the old and new table.

The /[[QHT]] packet is used to supply a query hash table update to a neighbouring hub.
Its format is compatible with the Gnutella1 "query routing protocol", except that
Gnutella2 requires a 1-bit per value table, while Gnutella1 requires a 4 or 8 bit per
value table. Gnutella2 supports patch compression using the deflate algorithm,
however, this should not be applied if the TCP link itself is compressed.

== Table Access ==

A table of 2N bits can be stored in an array of bytes 2N/8 long. To resolve a
hash-value into a byte and bit number, use the following equations:

<pre>
<nowiki>
int nByte = ( nHashValue >> 3 );
int nBit = ( nHashValue & 7 );
</nowiki>
</pre>

The least significant bit is numbered 0; the most significant bit is numbered 7. To set
a bit as empty (setting it to 1):

<pre>
<nowiki>
table_ptr[ nByte ] |= ( 1 << nBit );
</nowiki>
</pre>

To set a bit as full (setting it to zero):

<pre>
<nowiki>
table_ptr[ nByte ] &= ~( 1 << nBit );
</nowiki>
</pre>

== The Aggregate or Superset Table ==

Nodes operating in hub mode must maintain an aggregate or superset query hash
table, consisting of their own searchable content combined with the QHTs supplied
by all connected leaves. This aggregate table is supplied to neighbouring hubs,
allowing them to completely filter traffic for the local hub and its leaves.

This has two important implications:
* When a change is detected in either the local content or a connected leaf node's QHT, the aggregate table must be rebuilt and patches dispatched to neighbouring hubs. This will happen often, so an appropriate minimum time between updates should be used. One minute is effective.
* An aggregate table representing 400 leaves will be much denser than a table representing one node. This means that all tables must be large enough that the aggregate table remains productively sparse.

To create an aggregate table, start with an empty table of fixed size containing all
1's. For each contributing table, copy any zero (full) bits to the aggregate table. This
is effectively an AND operation. If a source table is smaller than the aggregate table,
a single 0 bit in the source will equate to several zero bits in the aggregate. If the
source table is larger than the aggregate table, a single zero bit will map to a single
bit with some loss of accuracy.

It is of great importance that all QHTs in the system be sufficiently large to allow an
aggregate table to remain suitably sparse. Ideally each leaf node should provide a
table less than 1% dense.

== Query Filtering ==

Before transmitting a query packet to a connection that has provided a query hash
table, match the words and URNs in the query against the QHT.

* If any of the lookups based on URNs found a hit, send the query packet
* If at least two thirds of lookups based on words found a hit, send
* Otherwise, drop the packet

It is important to apply the "two thirds" rule only for words. URNs must provide an
automatic match.

Consider all text content in the query, including generic search text and metadata
search text if it is present. When dealing with simple query language that involves
quoted phrases and exclusions, apply the following rules:

* Tokenize quoted phrases into words, ignoring the phrase at this level
* Ignore excluded words - these do not count as table hits or misses
* Remember to apply exclusion to every word in an excluded phrase

See the section on the simple query language for more information.

Query Hash Tables

2005-03-28T11:54:40Z

Kath: /* Word Prefix Extensions */

== Introduction ==

Building an efficient network topology is not viable without means to restrict the flow
of information appropriately. In the Gnutella2 network architecture, neighbouring
nodes exchange compressed hash filter tables describing the content, which can be
reached by communicating with them. These tables are updated dynamically as
necessary.

The concept of filtering hash tables in file sharing was pioneered by Limegroup for
the LimeWire? Gnutella1 application.

== Table Properties ==

Query hash tables or QHTs provide enough information to know with certainty that a
particular node (and possibly its descendants) will not be able to provide any
matching objects for a given query. Conversely, the QHT may reveal that a node or
its descendants may be able to provide matching objects.

This property means that queries can be discarded confidently when a transmission
is known to be unnecessary, while not providing the filtering or forwarding node, any
actual information about the searchable content. Neighbours know what their
neighbours do not have, but cannot say for sure what they do have. QHTs are also
very efficient, both in terms of exchange and maintenance and lookup cost.

== Table Content ==

A QHT is a table of 2N bits, where each bit represents a unique word-hash value.
For example a table of 20 bits has 220 = 1048576 possible word-hash values. If
stored uncompressed this table would be 128 KB in size.

In an empty table, every word-hash value bit will be "1", which represents "empty".
To populate the table with searchable content, an application must:

* Locate every plain-text word and URN which could be searched for and produce a match/hit
* Hash the word with a simple hash function to produce a word-hash value which is 0 <= value < 2N.
* Set the appropriate bit in the table to zero, representing "full".

== Word Hashing ==

Words are strings of one or more alphanumeric characters which are not all numeric.

To convert a word into a hash value, the following case-insensitive algorithm is used:

<pre>
<nowiki>
// HashWord( string_ptr, char_count, table_bit_count );
DWORD CQueryHashTable::HashWord(LPCTSTR psz, int nLength, int nBits)
{
DWORD nNumber = 0;
int nByte = 0;
for ( ; nLength > 0 ; nLength--, psz++ )
{
int nValue = tolower( *psz ) & 0xFF;
nValue = nValue << ( nByte * 8 );
nByte = ( nByte + 1 ) & 3;
nNumber = nNumber ^ nValue;
}
return HashNumber( nNumber, nBits );
}

DWORD CQueryHashTable::HashNumber(DWORD nNumber, int nBits)
{
WORD64 nProduct = (WORD64)nNumber * (WORD64)0x4F1BBCDC;
WORD64 nHash = ( nProduct << 32 ) >> ( 32 + ( 32 - nBits ) );
return (DWORD)nHash;
}
</nowiki>
</pre>

== URNs - A Special Case ==

URNs are treated as a special case: rather than dividing them up into word tokens,
they are hashed as a complete fixed length string. For example:

<pre>
<nowiki>
urn:sha1:WIXYJFVJMIWNMUWPRPBGUTODIV52RMJA
</nowiki>
</pre>

Bitprint URNs are actually composite values which include both a SHA1 and
TigerTree root value. Rather than adding the whole bitprint to the table, each of the
constituent URNs are added separately. This allows SHA1-only querying and TigerTree-only
querying. A root TigerTree URN takes the form:

<pre>
<nowiki>
urn:tree:tiger/:CN25MLNU3XNN7IHKZMNOA63XG6SKDJ2W7Z3HONA
</nowiki>
</pre>

Other URNs should be expressed in their most natural form before being fed to the
word hash function.

== Word Prefix Extensions ==

For words consisting of at least five characters, it is often useful to be able to match
substrings within the word. Unfortunately, adding every possible substring of each
word would increase the density of the QHT, however, a simple and effective
compromise is available:

For words with 5 or more characters:

* Hash and add the whole word
* Hash and add the whole word minus the last character
* Hash and add the whole word minus the last two characters

This allows searching on prefixes of the word, for example "match" will now match
"matches".

== Table Exchange ==

Nodes must keep their neighbouring hubs up to date with their latest local query
hash table at all times. Rather than sending the whole table whenever it changes,
nodes may opt to send a "table patch", which includes only the difference between
the old and new table.

The /[[QHT]] packet is used to supply a query hash table update to a neighbouring hub.
Its format is compatible with the Gnutella1 "query routing protocol", except that
Gnutella2 requires a 1-bit per value table while Gnutella1 requires a 4 or 8 bit per
value table. Gnutella2 supports patch compression using the deflate algorithm,
however this should not be applied if the TCP link itself is compressed.

== Table Access ==

A table of 2N bits can be stored in an array of bytes 2N/8 long. To resolve a
hash-value into a byte and bit number, use the following equations:

<pre>
<nowiki>
int nByte = ( nHashValue >> 3 );
int nBit = ( nHashValue & 7 );
</nowiki>
</pre>

The least significant bit is numbered 0; the most significant bit is numbered 7. To set
a bit as empty (setting it to 1):

<pre>
<nowiki>
table_ptr[ nByte ] |= ( 1 << nBit );
</nowiki>
</pre>

To set a bit as full (setting it to zero):

<pre>
<nowiki>
table_ptr[ nByte ] &= ~( 1 << nBit );
</nowiki>
</pre>

== The Aggregate or Superset Table ==

Nodes operating in hub mode must maintain an aggregate or superset query hash
table, consisting of their own searchable content combined with the QHTs supplied
by all connected leaves. This aggregate table is supplied to neighbouring hubs,
allowing them to completely filter traffic for the local hub and its leaves.

This has two important implications:
* When a change is detected in either the local content or a connected leaf node's QHT, the aggregate table must be rebuilt and patches dispatched to neighbouring hubs. This will happen often, so an appropriate minimum time between updates should be used. One minute is effective.
* An aggregate table representing 400 leaves will be much denser than a table representing one node. This means that all tables must be large enough that the aggregate table remains productively sparse.

To create an aggregate table, start with an empty table of fixed size containing all
1's. For each contributing table, copy any zero (full) bits to the aggregate table. This
is effectively an AND operation. If a source table is smaller than the aggregate table,
a single 0 bit in the source will equate to several zero bits in the aggregate. If the
source table is larger than the aggregate table, a single zero bit will map to a single
bit with some loss of accuracy.

It is of great importance that all QHTs in the system be sufficiently large to allow an
aggregate table to remain suitably sparse. Ideally each leaf node should provide a
table less than 1% dense.

== Query Filtering ==

Before transmitting a query packet to a connection that has provided a query hash
table, match the words and URNs in the query against the QHT.

* If any of the lookups based on URNs found a hit, send the query packet
* If at least two thirds of lookups based on words found a hit, send
* Otherwise, drop the packet

It is important to apply the "two thirds" rule only for words. URNs must provide an
automatic match.

Consider all text content in the query, including generic search text and metadata
search text if it is present. When dealing with simple query language that involves
quoted phrases and exclusions, apply the following rules:

* Tokenize quoted phrases into words, ignoring the phrase at this level
* Ignore excluded words - these do not count as table hits or misses
* Remember to apply exclusion to every word in an excluded phrase

See the section on the simple query language for more information.

Query Hash Tables

2005-03-28T11:52:43Z

Kath: /* Table Properties */

== Introduction ==

Building an efficient network topology is not viable without means to restrict the flow
of information appropriately. In the Gnutella2 network architecture, neighbouring
nodes exchange compressed hash filter tables describing the content, which can be
reached by communicating with them. These tables are updated dynamically as
necessary.

The concept of filtering hash tables in file sharing was pioneered by Limegroup for
the LimeWire? Gnutella1 application.

== Table Properties ==

Query hash tables or QHTs provide enough information to know with certainty that a
particular node (and possibly its descendants) will not be able to provide any
matching objects for a given query. Conversely, the QHT may reveal that a node or
its descendants may be able to provide matching objects.

This property means that queries can be discarded confidently when a transmission
is known to be unnecessary, while not providing the filtering or forwarding node, any
actual information about the searchable content. Neighbours know what their
neighbours do not have, but cannot say for sure what they do have. QHTs are also
very efficient, both in terms of exchange and maintenance and lookup cost.

== Table Content ==

A QHT is a table of 2N bits, where each bit represents a unique word-hash value.
For example a table of 20 bits has 220 = 1048576 possible word-hash values. If
stored uncompressed this table would be 128 KB in size.

In an empty table, every word-hash value bit will be "1", which represents "empty".
To populate the table with searchable content, an application must:

* Locate every plain-text word and URN which could be searched for and produce a match/hit
* Hash the word with a simple hash function to produce a word-hash value which is 0 <= value < 2N.
* Set the appropriate bit in the table to zero, representing "full".

== Word Hashing ==

Words are strings of one or more alphanumeric characters which are not all numeric.

To convert a word into a hash value, the following case-insensitive algorithm is used:

<pre>
<nowiki>
// HashWord( string_ptr, char_count, table_bit_count );
DWORD CQueryHashTable::HashWord(LPCTSTR psz, int nLength, int nBits)
{
DWORD nNumber = 0;
int nByte = 0;
for ( ; nLength > 0 ; nLength--, psz++ )
{
int nValue = tolower( *psz ) & 0xFF;
nValue = nValue << ( nByte * 8 );
nByte = ( nByte + 1 ) & 3;
nNumber = nNumber ^ nValue;
}
return HashNumber( nNumber, nBits );
}

DWORD CQueryHashTable::HashNumber(DWORD nNumber, int nBits)
{
WORD64 nProduct = (WORD64)nNumber * (WORD64)0x4F1BBCDC;
WORD64 nHash = ( nProduct << 32 ) >> ( 32 + ( 32 - nBits ) );
return (DWORD)nHash;
}
</nowiki>
</pre>

== URNs - A Special Case ==

URNs are treated as a special case: rather than dividing them up into word tokens,
they are hashed as a complete fixed length string. For example:

<pre>
<nowiki>
urn:sha1:WIXYJFVJMIWNMUWPRPBGUTODIV52RMJA
</nowiki>
</pre>

Bitprint URNs are actually composite values which include both a SHA1 and
TigerTree root value. Rather than adding the whole bitprint to the table, each of the
constituent URNs are added separately. This allows SHA1-only querying and TigerTree-only
querying. A root TigerTree URN takes the form:

<pre>
<nowiki>
urn:tree:tiger/:CN25MLNU3XNN7IHKZMNOA63XG6SKDJ2W7Z3HONA
</nowiki>
</pre>

Other URNs should be expressed in their most natural form before being fed to the
word hash function.

== Word Prefix Extensions ==

For words consisting of at least five characters, it is often useful to be able to match
substrings within the word. Unfortunately adding every possible substring of each
word would increase the density of the QHT, however a simple and effective
compromise is available:

For words with 5 or more characters:

* Hash and add the whole word
* Hash and add the whole word minus the last character
* Hash and add the whole word minus the last two characters

This allows searching on prefixes of the word, for example "match" will now match
"matches".

== Table Exchange ==

Nodes must keep their neighbouring hubs up to date with their latest local query
hash table at all times. Rather than sending the whole table whenever it changes,
nodes may opt to send a "table patch", which includes only the difference between
the old and new table.

The /[[QHT]] packet is used to supply a query hash table update to a neighbouring hub.
Its format is compatible with the Gnutella1 "query routing protocol", except that
Gnutella2 requires a 1-bit per value table while Gnutella1 requires a 4 or 8 bit per
value table. Gnutella2 supports patch compression using the deflate algorithm,
however this should not be applied if the TCP link itself is compressed.

== Table Access ==

A table of 2N bits can be stored in an array of bytes 2N/8 long. To resolve a
hash-value into a byte and bit number, use the following equations:

<pre>
<nowiki>
int nByte = ( nHashValue >> 3 );
int nBit = ( nHashValue & 7 );
</nowiki>
</pre>

The least significant bit is numbered 0; the most significant bit is numbered 7. To set
a bit as empty (setting it to 1):

<pre>
<nowiki>
table_ptr[ nByte ] |= ( 1 << nBit );
</nowiki>
</pre>

To set a bit as full (setting it to zero):

<pre>
<nowiki>
table_ptr[ nByte ] &= ~( 1 << nBit );
</nowiki>
</pre>

== The Aggregate or Superset Table ==

Nodes operating in hub mode must maintain an aggregate or superset query hash
table, consisting of their own searchable content combined with the QHTs supplied
by all connected leaves. This aggregate table is supplied to neighbouring hubs,
allowing them to completely filter traffic for the local hub and its leaves.

This has two important implications:
* When a change is detected in either the local content or a connected leaf node's QHT, the aggregate table must be rebuilt and patches dispatched to neighbouring hubs. This will happen often, so an appropriate minimum time between updates should be used. One minute is effective.
* An aggregate table representing 400 leaves will be much denser than a table representing one node. This means that all tables must be large enough that the aggregate table remains productively sparse.

To create an aggregate table, start with an empty table of fixed size containing all
1's. For each contributing table, copy any zero (full) bits to the aggregate table. This
is effectively an AND operation. If a source table is smaller than the aggregate table,
a single 0 bit in the source will equate to several zero bits in the aggregate. If the
source table is larger than the aggregate table, a single zero bit will map to a single
bit with some loss of accuracy.

It is of great importance that all QHTs in the system be sufficiently large to allow an
aggregate table to remain suitably sparse. Ideally each leaf node should provide a
table less than 1% dense.

== Query Filtering ==

Before transmitting a query packet to a connection that has provided a query hash
table, match the words and URNs in the query against the QHT.

* If any of the lookups based on URNs found a hit, send the query packet
* If at least two thirds of lookups based on words found a hit, send
* Otherwise, drop the packet

It is important to apply the "two thirds" rule only for words. URNs must provide an
automatic match.

Consider all text content in the query, including generic search text and metadata
search text if it is present. When dealing with simple query language that involves
quoted phrases and exclusions, apply the following rules:

* Tokenize quoted phrases into words, ignoring the phrase at this level
* Ignore excluded words - these do not count as table hits or misses
* Remember to apply exclusion to every word in an excluded phrase

See the section on the simple query language for more information.

Query Hash Tables

2005-03-28T11:51:25Z

Kath: /* Introduction */

== Introduction ==

Building an efficient network topology is not viable without means to restrict the flow
of information appropriately. In the Gnutella2 network architecture, neighbouring
nodes exchange compressed hash filter tables describing the content, which can be
reached by communicating with them. These tables are updated dynamically as
necessary.

The concept of filtering hash tables in file sharing was pioneered by Limegroup for
the LimeWire? Gnutella1 application.

== Table Properties ==

Query hash tables or QHTs provide enough information to know with certainty that a
particular node (and possibly its descendants) will not be able to provide any
matching objects for a given query. Conversely the QHT may reveal that a node or
its descendants may be able to provide matching objects.

This property means that queries can be discarded confidently when a transmission
is known to be unnecessary, while not providing the filtering or forwarding node any
actual information about the searchable content. Neighbours know what their
neighbours do not have, but cannot say for sure what they do have. QHTs are also
very efficient both in terms of exchange and maintenance and lookup cost.

== Table Content ==

A QHT is a table of 2N bits, where each bit represents a unique word-hash value.
For example a table of 20 bits has 220 = 1048576 possible word-hash values. If
stored uncompressed this table would be 128 KB in size.

In an empty table, every word-hash value bit will be "1", which represents "empty".
To populate the table with searchable content, an application must:

* Locate every plain-text word and URN which could be searched for and produce a match/hit
* Hash the word with a simple hash function to produce a word-hash value which is 0 <= value < 2N.
* Set the appropriate bit in the table to zero, representing "full".

== Word Hashing ==

Words are strings of one or more alphanumeric characters which are not all numeric.

To convert a word into a hash value, the following case-insensitive algorithm is used:

<pre>
<nowiki>
// HashWord( string_ptr, char_count, table_bit_count );
DWORD CQueryHashTable::HashWord(LPCTSTR psz, int nLength, int nBits)
{
DWORD nNumber = 0;
int nByte = 0;
for ( ; nLength > 0 ; nLength--, psz++ )
{
int nValue = tolower( *psz ) & 0xFF;
nValue = nValue << ( nByte * 8 );
nByte = ( nByte + 1 ) & 3;
nNumber = nNumber ^ nValue;
}
return HashNumber( nNumber, nBits );
}

DWORD CQueryHashTable::HashNumber(DWORD nNumber, int nBits)
{
WORD64 nProduct = (WORD64)nNumber * (WORD64)0x4F1BBCDC;
WORD64 nHash = ( nProduct << 32 ) >> ( 32 + ( 32 - nBits ) );
return (DWORD)nHash;
}
</nowiki>
</pre>

== URNs - A Special Case ==

URNs are treated as a special case: rather than dividing them up into word tokens,
they are hashed as a complete fixed length string. For example:

<pre>
<nowiki>
urn:sha1:WIXYJFVJMIWNMUWPRPBGUTODIV52RMJA
</nowiki>
</pre>

Bitprint URNs are actually composite values which include both a SHA1 and
TigerTree root value. Rather than adding the whole bitprint to the table, each of the
constituent URNs are added separately. This allows SHA1-only querying and TigerTree-only
querying. A root TigerTree URN takes the form:

<pre>
<nowiki>
urn:tree:tiger/:CN25MLNU3XNN7IHKZMNOA63XG6SKDJ2W7Z3HONA
</nowiki>
</pre>

Other URNs should be expressed in their most natural form before being fed to the
word hash function.

== Word Prefix Extensions ==

For words consisting of at least five characters, it is often useful to be able to match
substrings within the word. Unfortunately adding every possible substring of each
word would increase the density of the QHT, however a simple and effective
compromise is available:

For words with 5 or more characters:

* Hash and add the whole word
* Hash and add the whole word minus the last character
* Hash and add the whole word minus the last two characters

This allows searching on prefixes of the word, for example "match" will now match
"matches".

== Table Exchange ==

Nodes must keep their neighbouring hubs up to date with their latest local query
hash table at all times. Rather than sending the whole table whenever it changes,
nodes may opt to send a "table patch", which includes only the difference between
the old and new table.

The /[[QHT]] packet is used to supply a query hash table update to a neighbouring hub.
Its format is compatible with the Gnutella1 "query routing protocol", except that
Gnutella2 requires a 1-bit per value table while Gnutella1 requires a 4 or 8 bit per
value table. Gnutella2 supports patch compression using the deflate algorithm,
however this should not be applied if the TCP link itself is compressed.

== Table Access ==

A table of 2N bits can be stored in an array of bytes 2N/8 long. To resolve a
hash-value into a byte and bit number, use the following equations:

<pre>
<nowiki>
int nByte = ( nHashValue >> 3 );
int nBit = ( nHashValue & 7 );
</nowiki>
</pre>

The least significant bit is numbered 0; the most significant bit is numbered 7. To set
a bit as empty (setting it to 1):

<pre>
<nowiki>
table_ptr[ nByte ] |= ( 1 << nBit );
</nowiki>
</pre>

To set a bit as full (setting it to zero):

<pre>
<nowiki>
table_ptr[ nByte ] &= ~( 1 << nBit );
</nowiki>
</pre>

== The Aggregate or Superset Table ==

Nodes operating in hub mode must maintain an aggregate or superset query hash
table, consisting of their own searchable content combined with the QHTs supplied
by all connected leaves. This aggregate table is supplied to neighbouring hubs,
allowing them to completely filter traffic for the local hub and its leaves.

This has two important implications:
* When a change is detected in either the local content or a connected leaf node's QHT, the aggregate table must be rebuilt and patches dispatched to neighbouring hubs. This will happen often, so an appropriate minimum time between updates should be used. One minute is effective.
* An aggregate table representing 400 leaves will be much denser than a table representing one node. This means that all tables must be large enough that the aggregate table remains productively sparse.

To create an aggregate table, start with an empty table of fixed size containing all
1's. For each contributing table, copy any zero (full) bits to the aggregate table. This
is effectively an AND operation. If a source table is smaller than the aggregate table,
a single 0 bit in the source will equate to several zero bits in the aggregate. If the
source table is larger than the aggregate table, a single zero bit will map to a single
bit with some loss of accuracy.

It is of great importance that all QHTs in the system be sufficiently large to allow an
aggregate table to remain suitably sparse. Ideally each leaf node should provide a
table less than 1% dense.

== Query Filtering ==

Before transmitting a query packet to a connection that has provided a query hash
table, match the words and URNs in the query against the QHT.

* If any of the lookups based on URNs found a hit, send the query packet
* If at least two thirds of lookups based on words found a hit, send
* Otherwise, drop the packet

It is important to apply the "two thirds" rule only for words. URNs must provide an
automatic match.

Consider all text content in the query, including generic search text and metadata
search text if it is present. When dealing with simple query language that involves
quoted phrases and exclusions, apply the following rules:

* Tokenize quoted phrases into words, ignoring the phrase at this level
* Ignore excluded words - these do not count as table hits or misses
* Remember to apply exclusion to every word in an excluded phrase

See the section on the simple query language for more information.

Known Hub Cache and Hub Cluster Cache

2005-03-27T17:32:40Z

Kath: /* The Hub Cluster Cache */

== Introduction ==

Each Gnutella2 node must maintain a non-exhaustive cache of known hubs at the
global level, and an exhaustive cache of hubs within the neighbouring hub cluster(s).

== The Known Hub Cache ==

The known hub list is used to provide connection hints to the local connection
manager, and other nodes which connect permanently or transiently. The most
recent portion of it is exchanged with neighbours regularly.

It is also used when executing a query on the network, which involves iteratively
contacting hubs representing each hub cluster and simultaneously recording the
hubs which have been accounted for and adding those newly discovered.

The known hub cache should be highly efficient, addressable by node address and
timestamp, and sorted by the last seen timestamp of each hub record. Adding fresh
hubs should push the oldest hubs from the bottom of the cache.

It is suggested that each cached hub entry store:

* Node address
* Last seen time
* Query key, if available
* [[Datatypes|GUID]], if available
* Vendor information, if desired and available
* Throttling timing for last query key request, last query, etc

Hubs whose last seen timestamp is too old should be removed from the cache, and
should certainly never be sent to other nodes.

== The Hub Cluster Cache ==

The hub cluster cache is used to maintain an up-to-date list of neighbouring hubs,
and the neighbouring hubs of neighbouring hubs. Thinking in terms of hops, it is an
exhaustive list of every hub which is 1 or 2 hops away from the local node (be the
local node a hub or a leaf). The cluster cache is really only important when operating
in hub mode, however, it can be maintained in leaf mode for informational purposes.

The hub cluster cache is used by a hub when responding to a keyed remote query
request, in the generation of a query acknowledgement packet (/[[QA]]). The /[[QA]]
packet contains a list of neighbouring hubs and a selection of second degree
neighbours.

The cluster cache is updated using information from /[[LNI]] and /[[KHL]] packets.

Gnutella2 Standard

2005-03-27T17:21:03Z

Kath: /* Common Gnutella2 Standard (All Applications) */

== What is the Gnutella2 Standard? ==

The Gnutella2 Standard is a set of requirements for building applications which
operate with the [[Gnutella2 Network]] in different capacities. For example, the
Gnutella2 Standard for File Sharing specifies a set of features and behaviours which
must be available in any Gnutella2-connected file-sharing product offered to the
public.

== Why is a Standard Needed? ==

As an open, general purpose platform, Gnutella2 networks must be able to operate
with a diverse family of different implementing applications. Every effort has been
made to limit the ill-effects a non-compliant application can cause (deliberately or
accidentally), however, when it comes to critical features such as common URN
schemes and character encodings, minimum standards help to ensure a favourable
baseline user experience.

== How are Standards Enforced? ==

The open and transparent nature of the Gnutella2 architecture makes technical
enforcement difficult, so a more viable (and hopefully, more productive) social
scheme has instead been adopted. Only applications meeting the appropriate
Gnutella2 Standard may be marked as "Gnutella2-compliant". Websites containing
information about Gnutella2 (such as gnutella2.com) are encouraged to list only
compliant applications, and application developers are encouraged to deny
communications with known non-compliant applications. Applications which do not
comply with the standard, or are still in the development process, should never be
made available to the public, however, private testing is always encouraged.

== How are Applications Tested? ==

Ultimately, it is the responsibility of the developer to ensure their own application
complies with the relevant standards, both with respect to Gnutella2 and any other
functionality they may be including. However, as an inter-dependent community,
developers of Gnutella2-compliant applications are encouraged to take an interest in
other Gnutella2 applications, and where possible, examine them for compliance.
Similarly, new developers are strongly encouraged to seek assistance from other
developers in verifying their work. This need not compromise competitive advantage
- if the application is sensitive, the important compliance testing phase can be
performed in the days prior to release.

== What Standards are Available? ==

At the current time, only one Gnutella2 standard has been published: the Gnutella2
standard for File Sharing Applications. Additional standards for other
application classes will be published in the future as required.

Developers of new application classes are operating in somewhat untried territory,
and should review the existing published standards for best practices which can be
borrowed. In particular, the basic components of the Gnutella2 network architecture
should always be implemented in full.

== Common Gnutella2 Standard (All Applications) ==

All applications making use of Gnutella2 technology for any application class MUST
IMPLEMENT the following core features:

* Bidirectional TCP stream connections (stream compression OPTIONAL)
* Bidirectional reliable UDP protocol (Gnutella2 reliability layer and stateless compression REQUIRED)
* HTTP-style link negotiation, exchanging at least the required headers
* Gnutella2 protocol support, graceful handling of unknown trees
* Localised, UTF-8 and UNICODE decode REQUIRED, encoding to each optional
* Operation in LEAF mode, additional node states OPTIONAL
* Basic link handshaking and maintenance functionality (PI/PO/LNI/KHL)
* Global node addressing scheme and routing maintenance, addressing children (TO)
* Reverse (PUSH) connection response (connecting out)
* HTTP/1.1 client and server for peer to peer transactions
* Gnutella2 Standard for File Sharing

----

Applications making use of Gnutella2 technology for file sharing MUST IMPLEMENT
the following features:

* All of the COMMON features listed in the previous section
* Operation in LEAF mode, additional node states OPTIONAL
* Some form of bandwidth management scheme, to keep network and transfer bandwidth below 95% of the user's link capacity - be it manually configured or some automatic scheme (very important to avoid flooding local connection)
* [[SHA1]] and TIGER ROOT [[URN]]s for all shared objects
* XML metadata, using existing schemas where appropriate (manual entry and peer acquired at minimum, automatic local collection highly recommended, service lookup optional)
* Universal 1-bit query hash filter, at least 2^20 length, intelligent density management scheme (superset combination required if supporting hub mode)
* Gnutella2 object search mechanism, all client responsibilities and if supporting hub mode, server responsibilities too
* Local search processing, including simple query language (Boolean operations, quoted search terms, numeric range searches, interest flagging (I), local rulebased metadata searching)
* Extensible hit format (URN/DN/MD/URL are REQUIRED, all other extensions OPTIONAL)
* HTTP/1.1 based upload system, URN based requesting, partial content requests, active queuing, partial file uploading, timestamp protected alternate source cache and exchange
* [[TigerTree]] volume calculation on shared files, caching on downloads, exchange via [[DIME]]. Local corruption detection OPTIONAL but recommended.

Gnutella2 Standard

2005-03-27T17:16:52Z

Kath: /* How are Applications Tested? */

== What is the Gnutella2 Standard? ==

The Gnutella2 Standard is a set of requirements for building applications which
operate with the [[Gnutella2 Network]] in different capacities. For example, the
Gnutella2 Standard for File Sharing specifies a set of features and behaviours which
must be available in any Gnutella2-connected file-sharing product offered to the
public.

== Why is a Standard Needed? ==

As an open, general purpose platform, Gnutella2 networks must be able to operate
with a diverse family of different implementing applications. Every effort has been
made to limit the ill-effects a non-compliant application can cause (deliberately or
accidentally), however, when it comes to critical features such as common URN
schemes and character encodings, minimum standards help to ensure a favourable
baseline user experience.

== How are Standards Enforced? ==

The open and transparent nature of the Gnutella2 architecture makes technical
enforcement difficult, so a more viable (and hopefully, more productive) social
scheme has instead been adopted. Only applications meeting the appropriate
Gnutella2 Standard may be marked as "Gnutella2-compliant". Websites containing
information about Gnutella2 (such as gnutella2.com) are encouraged to list only
compliant applications, and application developers are encouraged to deny
communications with known non-compliant applications. Applications which do not
comply with the standard, or are still in the development process, should never be
made available to the public, however, private testing is always encouraged.

== How are Applications Tested? ==

Ultimately, it is the responsibility of the developer to ensure their own application
complies with the relevant standards, both with respect to Gnutella2 and any other
functionality they may be including. However, as an inter-dependent community,
developers of Gnutella2-compliant applications are encouraged to take an interest in
other Gnutella2 applications, and where possible, examine them for compliance.
Similarly, new developers are strongly encouraged to seek assistance from other
developers in verifying their work. This need not compromise competitive advantage
- if the application is sensitive, the important compliance testing phase can be
performed in the days prior to release.

== What Standards are Available? ==

At the current time, only one Gnutella2 standard has been published: the Gnutella2
standard for File Sharing Applications. Additional standards for other
application classes will be published in the future as required.

Developers of new application classes are operating in somewhat untried territory,
and should review the existing published standards for best practices which can be
borrowed. In particular, the basic components of the Gnutella2 network architecture
should always be implemented in full.

== Common Gnutella2 Standard (All Applications) ==

All applications making use of Gnutella2 technology for any application class MUST
IMPLEMENT the following core features:

* Bidirectional TCP stream connections (stream compression OPTIONAL)
* Bidirectional reliable UDP protocol (Gnutella2 reliability layer and stateless compression REQUIRED)
* HTTP-style link negotiation, exchanging at least the required headers
* Gnutella2 protocol support, graceful handling of unknown trees
* Localised, UTF-8 and UNICODE decode REQUIRED, encoding to each optional
* Operation in LEAF mode, additional node states OPTIONAL
* Basic link handshaking and maintenance functionality (PI/PO/LNI/KHL)
* Global node addressing scheme and routing maintenance, addressing children (TO)
* Reverse (PUSH) connection response (connecting out)
* HTTP/1.1 client and server for peer to peer transactions
* Gnutella2 Standard for File Sharing

----

Applications making use of Gnutella2 technology for file sharing MUST IMPLEMENT
the following features:

* All of the COMMON features listed in the previous section
* Operation in LEAF mode, additional node states OPTIONAL
* Some form of bandwidth management scheme to keep network and transfer bandwidth below 95% of the user's link capacity - be it manually configured or some automatic scheme (very important to avoid flooding local connection)
* [[SHA1]] and TIGER ROOT [[URN]]s for all shared objects
* XML metadata using existing schemas where appropriate (manual entry and peer acquired at minimum, automatic local collection highly recommended, service lookup optional)
* Universal 1-bit query hash filter, at least 2^20 length, intelligent density management scheme (superset combination required if supporting hub mode)
* Gnutella2 object search mechanism, all client responsibilities and if supporting hub mode, server responsibilities too
* Local search processing including simple query language (Boolean operations, quoted search terms, numeric range searches, interest flagging (I), local rulebased metadata searching)
* Extensible hit format (URN/DN/MD/URL are REQUIRED, all other extensions OPTIONAL)
* HTTP/1.1 based upload system, URN based requesting, partial content requests, active queuing, partial file uploading, timestamp protected alternate source cache and exchange
* [[TigerTree]] volume calculation on shared files, caching on downloads, exchange via [[DIME]]. Local corruption detection OPTIONAL but recommended.

Gnutella2 Standard

2005-03-27T17:15:52Z

Kath: /* How are Standards Enforced? */

== What is the Gnutella2 Standard? ==

The Gnutella2 Standard is a set of requirements for building applications which
operate with the [[Gnutella2 Network]] in different capacities. For example, the
Gnutella2 Standard for File Sharing specifies a set of features and behaviours which
must be available in any Gnutella2-connected file-sharing product offered to the
public.

== Why is a Standard Needed? ==

As an open, general purpose platform, Gnutella2 networks must be able to operate
with a diverse family of different implementing applications. Every effort has been
made to limit the ill-effects a non-compliant application can cause (deliberately or
accidentally), however, when it comes to critical features such as common URN
schemes and character encodings, minimum standards help to ensure a favourable
baseline user experience.

== How are Standards Enforced? ==

The open and transparent nature of the Gnutella2 architecture makes technical
enforcement difficult, so a more viable (and hopefully, more productive) social
scheme has instead been adopted. Only applications meeting the appropriate
Gnutella2 Standard may be marked as "Gnutella2-compliant". Websites containing
information about Gnutella2 (such as gnutella2.com) are encouraged to list only
compliant applications, and application developers are encouraged to deny
communications with known non-compliant applications. Applications which do not
comply with the standard, or are still in the development process, should never be
made available to the public, however, private testing is always encouraged.

== How are Applications Tested? ==

Ultimately it is the responsibility of the developer to ensure their own application
complies with the relevant standards, both with respect to Gnutella2 and any other
functionality they may be including. However as an inter-dependent community,
developers of Gnutella2-compliant applications are encouraged to take an interest in
other Gnutella2 applications, and where possible, examine them for compliance.
Similarly, new developers are strongly encouraged to seek assistance from other
developers in verifying their work. This need not compromise competitive advantage
- if the application is sensitive, the important compliance testing phase can be
performed in the days prior to release.

== What Standards are Available? ==

At the current time, only one Gnutella2 standard has been published: the Gnutella2
standard for File Sharing Applications. Additional standards for other
application classes will be published in the future as required.

Developers of new application classes are operating in somewhat untried territory,
and should review the existing published standards for best practices which can be
borrowed. In particular, the basic components of the Gnutella2 network architecture
should always be implemented in full.

== Common Gnutella2 Standard (All Applications) ==

All applications making use of Gnutella2 technology for any application class MUST
IMPLEMENT the following core features:

* Bidirectional TCP stream connections (stream compression OPTIONAL)
* Bidirectional reliable UDP protocol (Gnutella2 reliability layer and stateless compression REQUIRED)
* HTTP-style link negotiation, exchanging at least the required headers
* Gnutella2 protocol support, graceful handling of unknown trees
* Localised, UTF-8 and UNICODE decode REQUIRED, encoding to each optional
* Operation in LEAF mode, additional node states OPTIONAL
* Basic link handshaking and maintenance functionality (PI/PO/LNI/KHL)
* Global node addressing scheme and routing maintenance, addressing children (TO)
* Reverse (PUSH) connection response (connecting out)
* HTTP/1.1 client and server for peer to peer transactions
* Gnutella2 Standard for File Sharing

----

Applications making use of Gnutella2 technology for file sharing MUST IMPLEMENT
the following features:

* All of the COMMON features listed in the previous section
* Operation in LEAF mode, additional node states OPTIONAL
* Some form of bandwidth management scheme to keep network and transfer bandwidth below 95% of the user's link capacity - be it manually configured or some automatic scheme (very important to avoid flooding local connection)
* [[SHA1]] and TIGER ROOT [[URN]]s for all shared objects
* XML metadata using existing schemas where appropriate (manual entry and peer acquired at minimum, automatic local collection highly recommended, service lookup optional)
* Universal 1-bit query hash filter, at least 2^20 length, intelligent density management scheme (superset combination required if supporting hub mode)
* Gnutella2 object search mechanism, all client responsibilities and if supporting hub mode, server responsibilities too
* Local search processing including simple query language (Boolean operations, quoted search terms, numeric range searches, interest flagging (I), local rulebased metadata searching)
* Extensible hit format (URN/DN/MD/URL are REQUIRED, all other extensions OPTIONAL)
* HTTP/1.1 based upload system, URN based requesting, partial content requests, active queuing, partial file uploading, timestamp protected alternate source cache and exchange
* [[TigerTree]] volume calculation on shared files, caching on downloads, exchange via [[DIME]]. Local corruption detection OPTIONAL but recommended.

Gnutella2 Standard

2005-03-27T17:14:42Z

Kath: /* Why is a Standard Needed? */

== What is the Gnutella2 Standard? ==

The Gnutella2 Standard is a set of requirements for building applications which
operate with the [[Gnutella2 Network]] in different capacities. For example, the
Gnutella2 Standard for File Sharing specifies a set of features and behaviours which
must be available in any Gnutella2-connected file-sharing product offered to the
public.

== Why is a Standard Needed? ==

As an open, general purpose platform, Gnutella2 networks must be able to operate
with a diverse family of different implementing applications. Every effort has been
made to limit the ill-effects a non-compliant application can cause (deliberately or
accidentally), however, when it comes to critical features such as common URN
schemes and character encodings, minimum standards help to ensure a favourable
baseline user experience.

== How are Standards Enforced? ==

The open and transparent nature of the Gnutella2 architecture makes technical
enforcement difficult, so a more viable (and hopefully, more productive) social
scheme has instead been adopted. Only applications meeting the appropriate
Gnutella2 Standard may be marked as "Gnutella2-compliant". Websites containing
information about Gnutella2 (such as gnutella2.com) are encouraged to list only
compliant applications, and application developers are encouraged to deny
communications with known non-compliant applications. Applications which do not
comply with the standard or are still in the development process should never be
made available to the public, however private testing is always encouraged.

== How are Applications Tested? ==

Ultimately it is the responsibility of the developer to ensure their own application
complies with the relevant standards, both with respect to Gnutella2 and any other
functionality they may be including. However as an inter-dependent community,
developers of Gnutella2-compliant applications are encouraged to take an interest in
other Gnutella2 applications, and where possible, examine them for compliance.
Similarly, new developers are strongly encouraged to seek assistance from other
developers in verifying their work. This need not compromise competitive advantage
- if the application is sensitive, the important compliance testing phase can be
performed in the days prior to release.

== What Standards are Available? ==

At the current time, only one Gnutella2 standard has been published: the Gnutella2
standard for File Sharing Applications. Additional standards for other
application classes will be published in the future as required.

Developers of new application classes are operating in somewhat untried territory,
and should review the existing published standards for best practices which can be
borrowed. In particular, the basic components of the Gnutella2 network architecture
should always be implemented in full.

== Common Gnutella2 Standard (All Applications) ==

All applications making use of Gnutella2 technology for any application class MUST
IMPLEMENT the following core features:

* Bidirectional TCP stream connections (stream compression OPTIONAL)
* Bidirectional reliable UDP protocol (Gnutella2 reliability layer and stateless compression REQUIRED)
* HTTP-style link negotiation, exchanging at least the required headers
* Gnutella2 protocol support, graceful handling of unknown trees
* Localised, UTF-8 and UNICODE decode REQUIRED, encoding to each optional
* Operation in LEAF mode, additional node states OPTIONAL
* Basic link handshaking and maintenance functionality (PI/PO/LNI/KHL)
* Global node addressing scheme and routing maintenance, addressing children (TO)
* Reverse (PUSH) connection response (connecting out)
* HTTP/1.1 client and server for peer to peer transactions
* Gnutella2 Standard for File Sharing

----

Applications making use of Gnutella2 technology for file sharing MUST IMPLEMENT
the following features:

* All of the COMMON features listed in the previous section
* Operation in LEAF mode, additional node states OPTIONAL
* Some form of bandwidth management scheme to keep network and transfer bandwidth below 95% of the user's link capacity - be it manually configured or some automatic scheme (very important to avoid flooding local connection)
* [[SHA1]] and TIGER ROOT [[URN]]s for all shared objects
* XML metadata using existing schemas where appropriate (manual entry and peer acquired at minimum, automatic local collection highly recommended, service lookup optional)
* Universal 1-bit query hash filter, at least 2^20 length, intelligent density management scheme (superset combination required if supporting hub mode)
* Gnutella2 object search mechanism, all client responsibilities and if supporting hub mode, server responsibilities too
* Local search processing including simple query language (Boolean operations, quoted search terms, numeric range searches, interest flagging (I), local rulebased metadata searching)
* Extensible hit format (URN/DN/MD/URL are REQUIRED, all other extensions OPTIONAL)
* HTTP/1.1 based upload system, URN based requesting, partial content requests, active queuing, partial file uploading, timestamp protected alternate source cache and exchange
* [[TigerTree]] volume calculation on shared files, caching on downloads, exchange via [[DIME]]. Local corruption detection OPTIONAL but recommended.

Node Route Cache and Addressed Packet Forwarding

2005-03-27T17:09:24Z

Kath: /* Push Handshaking */

== Introduction ==

Each Gnutella2 node must maintain a dynamic node route cache to map node GUIDs
to appropriate destinations. The route cache is consulted when a packet needs to be
dispatched to, or toward a GUID-addressed node. GUID-addressing is used over
network-addressing in a number of situations.

== Data ==

Each entry in a node route cache will have:

* The GUID of the target node
* The local TCP connection providing the best route to the node, or
* A UDP endpoint for the node or the best route to the node

== Performance ==

A route cache needs to be addressable by GUID, and must implement a refresh
mechanism to store routes for an appropriate amount of time based upon route hits.
Many schemes exist to engineer efficient lookup systems, such as hash tables, twotable
exchanges and balanced trees.

The route cache is similar to the several GUID mapping caches involved in old
Gnutella applications, with two key differences:

* Each entry may map to a local address or a UDP endpoint
* There is only one unified route cache for all purposes

== Applications ==

GUIDs are used to identify virtual entities within the network, such as nodes (node
GUIDs), searches (search GUIDs) and other transactional requests. The GUIDs
associated with these entities can then be committed to the route cache and mapped
to reflect the easiest route back to the owner of the object.

All nodes should be aware of their directly connected neighbours, and treat these
node GUIDs as a special case that need never expire.

== Addressed Packet Forwarding ==

Any Gnutella2 packet may be addressed to a particular destination node by GUID.
Upon receiving an addressed packet, it should be immediately forwarded either to
the destination, or toward it. Addressed packets should not be interpreted locally,
unless the destination address matches the local GUID.

Loops are avoided by placing restrictions upon forwarding:

* If received from a leaf via TCP, a packet may be forwarded anywhere
* If received from a hub, a packet may only be forwarded to a leaf
* If received via UDP, a packet may not be forwarded via UDP

Note that these restrictions apply only to generically addressed packets. Some packet
types have specific forwarding rules which override these. These rules allow any
node to be reached in two hops.

Packets are addressed by including a special child packet as the first child of the root
packet. The child packet is named "TO", so its full name would be "/?/TO" where ? is
the root packet name. The address packet's payload consists of a 16 byte GUID, and
it has no children defined at the current time.

== Reverse Connection (Push) Requests ==

Addressed packet forwarding can be used to deliver any valid Gnutella2
communications to leaf nodes that are unable to accept TCP or UDP traffic directly.
The packet is simply sent to one of the hubs to which the target leaf is attached, or a
hub in the same hub cluster, and the forwarding rules will allow the packet to reach
its destination.

This mechanism is used to request that a "firewalled" leaf initiate a "reverse
connection" or "call-back" to an elected address. The root packet type "/[[PUSH]]" is
used.

== Push Handshaking ==

The Gnutella2 push handshake is very simple - it simply identifies the connection as
originating from a push, and provides the GUID of the initiating node:

<pre>
<nowiki>
PUSH guid:GUIDinHEXguidINhexGUIDinHEXguidI (\r\n)
(\r\n)
</nowiki>
</pre>

Note that unlike the Gnutella1 case, no purpose-specific information is provided. The
pushed connection can now be used for any purpose a normally established TCP link
could adopt, including, but not limited to:

* Gnutella2 network connection
* Data transfer
* Personal communications
* Etc

Gnutella2 implementations are however, strongly advised to provide backward
support for the Gnutella1 push handshake, even if not supporting Gnutella1 directly.
This is because applications supporting both protocols may be unable to determine in
advance, whether the host they are pushing to is Gnutella1 or Gnutella2.
The legacy-style push handshake looks like:

<pre>
<nowiki>
GIV 0:GUIDinHEXguidINhexGUIDinHEXguidI/ (\n\n)
</nowiki>
</pre>

The leading zero and trailing slash have task-specific meanings in Gnutella1;
however Gnutella2 applications can safely ignore them and consider only the GUID.
Be sure to allow for variable length handshakes, however.

Node Route Cache and Addressed Packet Forwarding

2005-03-27T17:06:18Z

Kath: /* Addressed Packet Forwarding */

== Introduction ==

Each Gnutella2 node must maintain a dynamic node route cache to map node GUIDs
to appropriate destinations. The route cache is consulted when a packet needs to be
dispatched to, or toward a GUID-addressed node. GUID-addressing is used over
network-addressing in a number of situations.

== Data ==

Each entry in a node route cache will have:

* The GUID of the target node
* The local TCP connection providing the best route to the node, or
* A UDP endpoint for the node or the best route to the node

== Performance ==

A route cache needs to be addressable by GUID, and must implement a refresh
mechanism to store routes for an appropriate amount of time based upon route hits.
Many schemes exist to engineer efficient lookup systems, such as hash tables, twotable
exchanges and balanced trees.

The route cache is similar to the several GUID mapping caches involved in old
Gnutella applications, with two key differences:

* Each entry may map to a local address or a UDP endpoint
* There is only one unified route cache for all purposes

== Applications ==

GUIDs are used to identify virtual entities within the network, such as nodes (node
GUIDs), searches (search GUIDs) and other transactional requests. The GUIDs
associated with these entities can then be committed to the route cache and mapped
to reflect the easiest route back to the owner of the object.

All nodes should be aware of their directly connected neighbours, and treat these
node GUIDs as a special case that need never expire.

== Addressed Packet Forwarding ==

Any Gnutella2 packet may be addressed to a particular destination node by GUID.
Upon receiving an addressed packet, it should be immediately forwarded either to
the destination, or toward it. Addressed packets should not be interpreted locally,
unless the destination address matches the local GUID.

Loops are avoided by placing restrictions upon forwarding:

* If received from a leaf via TCP, a packet may be forwarded anywhere
* If received from a hub, a packet may only be forwarded to a leaf
* If received via UDP, a packet may not be forwarded via UDP

Note that these restrictions apply only to generically addressed packets. Some packet
types have specific forwarding rules which override these. These rules allow any
node to be reached in two hops.

Packets are addressed by including a special child packet as the first child of the root
packet. The child packet is named "TO", so its full name would be "/?/TO" where ? is
the root packet name. The address packet's payload consists of a 16 byte GUID, and
it has no children defined at the current time.

== Reverse Connection (Push) Requests ==

Addressed packet forwarding can be used to deliver any valid Gnutella2
communications to leaf nodes that are unable to accept TCP or UDP traffic directly.
The packet is simply sent to one of the hubs to which the target leaf is attached, or a
hub in the same hub cluster, and the forwarding rules will allow the packet to reach
its destination.

This mechanism is used to request that a "firewalled" leaf initiate a "reverse
connection" or "call-back" to an elected address. The root packet type "/[[PUSH]]" is
used.

== Push Handshaking ==

The Gnutella2 push handshake is very simple - it simply identifies the connection as
originating from a push, and provides the GUID of the initiating node:

<pre>
<nowiki>
PUSH guid:GUIDinHEXguidINhexGUIDinHEXguidI (\r\n)
(\r\n)
</nowiki>
</pre>

Note that unlike the Gnutella1 case, no purpose-specific information is provided. The
pushed connection can now be used for any purpose a normally established TCP link
could adopt, including but not limited to:

* Gnutella2 network connection
* Data transfer
* Personal communications
* Etc

Gnutella2 implementations are however strongly advised to provide backward
support for the Gnutella1 push handshake, even if not supporting Gnutella1 directly.
This is because applications supporting both protocols may be unable to determine in
advance whether the host they are pushing to is Gnutella1 or Gnutella2.
The legacy-style push handshake looks like:

<pre>
<nowiki>
GIV 0:GUIDinHEXguidINhexGUIDinHEXguidI/ (\n\n)
</nowiki>
</pre>

The leading zero and trailing slash have task-specific meanings in Gnutella1;
however Gnutella2 applications can safely ignore them and consider only the GUID.
Be sure to allow for variable length handshakes, however.

Node Route Cache and Addressed Packet Forwarding

2005-03-27T17:03:11Z

Kath: /* Introduction */

== Introduction ==

Each Gnutella2 node must maintain a dynamic node route cache to map node GUIDs
to appropriate destinations. The route cache is consulted when a packet needs to be
dispatched to, or toward a GUID-addressed node. GUID-addressing is used over
network-addressing in a number of situations.

== Data ==

Each entry in a node route cache will have:

* The GUID of the target node
* The local TCP connection providing the best route to the node, or
* A UDP endpoint for the node or the best route to the node

== Performance ==

A route cache needs to be addressable by GUID, and must implement a refresh
mechanism to store routes for an appropriate amount of time based upon route hits.
Many schemes exist to engineer efficient lookup systems, such as hash tables, twotable
exchanges and balanced trees.

The route cache is similar to the several GUID mapping caches involved in old
Gnutella applications, with two key differences:

* Each entry may map to a local address or a UDP endpoint
* There is only one unified route cache for all purposes

== Applications ==

GUIDs are used to identify virtual entities within the network, such as nodes (node
GUIDs), searches (search GUIDs) and other transactional requests. The GUIDs
associated with these entities can then be committed to the route cache and mapped
to reflect the easiest route back to the owner of the object.

All nodes should be aware of their directly connected neighbours, and treat these
node GUIDs as a special case that need never expire.

== Addressed Packet Forwarding ==

Any Gnutella2 packet may be addressed to a particular destination node by GUID.
Upon receiving an addressed packet, it should be immediately forwarded either to
the destination, or toward it. Addressed packets should not be interpreted locally
unless the destination address matches the local GUID.

Loops are avoided by placing restrictions upon forwarding:

* If received from a leaf via TCP, a packet may be forwarded anywhere
* If received from a hub, a packet may only be forwarded to a leaf
* If received via UDP, a packet may not be forwarded via UDP

Note that these restrictions apply only to generically addressed packets. Some packet
types have specific forwarding rules which override these. These rules allow any
node to be reached in two hops.

Packets are addressed by including a special child packet as the first child of the root
packet. The child packet is named "TO", so its full name would be "/?/TO" where ? is
the root packet name. The address packet's payload consists of a 16 byte GUID, and
it has no children defined at the current time.

== Reverse Connection (Push) Requests ==

Addressed packet forwarding can be used to deliver any valid Gnutella2
communications to leaf nodes that are unable to accept TCP or UDP traffic directly.
The packet is simply sent to one of the hubs to which the target leaf is attached, or a
hub in the same hub cluster, and the forwarding rules will allow the packet to reach
its destination.

This mechanism is used to request that a "firewalled" leaf initiate a "reverse
connection" or "call-back" to an elected address. The root packet type "/[[PUSH]]" is
used.

== Push Handshaking ==

The Gnutella2 push handshake is very simple - it simply identifies the connection as
originating from a push, and provides the GUID of the initiating node:

<pre>
<nowiki>
PUSH guid:GUIDinHEXguidINhexGUIDinHEXguidI (\r\n)
(\r\n)
</nowiki>
</pre>

Note that unlike the Gnutella1 case, no purpose-specific information is provided. The
pushed connection can now be used for any purpose a normally established TCP link
could adopt, including but not limited to:

* Gnutella2 network connection
* Data transfer
* Personal communications
* Etc

Gnutella2 implementations are however strongly advised to provide backward
support for the Gnutella1 push handshake, even if not supporting Gnutella1 directly.
This is because applications supporting both protocols may be unable to determine in
advance whether the host they are pushing to is Gnutella1 or Gnutella2.
The legacy-style push handshake looks like:

<pre>
<nowiki>
GIV 0:GUIDinHEXguidINhexGUIDinHEXguidI/ (\n\n)
</nowiki>
</pre>

The leading zero and trailing slash have task-specific meanings in Gnutella1;
however Gnutella2 applications can safely ignore them and consider only the GUID.
Be sure to allow for variable length handshakes, however.

Datatypes

2005-03-27T17:01:16Z

Kath: /* Strings */

== Introduction ==

The format of a packet payload is defined by the packet type and can consist of any
binary data; however, there are a number of conventions in place for serializing
common datatypes.

== Multi-Byte Integers ==

Multi-byte integers are serialized in the byte-order of the topmost packet. Little endian
is the default byte-order; however, big-endian byte order can be selected for
those who want it.

== Network/Node Addresses ==

A network or node address consists of a physical address and a port number, and are
of variable length, depending on the address family.
In IPv4, a network/node address is six bytes long: 4 bytes for an IP address and 2
bytes for a port number as follows:

<pre>
<nowiki>
typedef struct
{
BYTE ip[4];
SHORT port;
} IPV4_ENDPOINT;
</nowiki>
</pre>

Note that this is considered an array of 4 8-bit integers (bytes), followed by a 16-bit
integer (short). Byte order does not affect bytes, but it will affect the 16-bit port
number.

IPv6 addresses are longer and are not yet defined within the scope of Gnutella2,
however, applications should be aware that if the node address is not 6 bytes, it is of
a different address family.

== GUIDs ==

Globally unique identifiers (GUIDs) are used to identify nodes on the network. GUIDs
are serialized as an array of 16 bytes.

== Strings ==

Strings are encoded with UTF-8 encoding and serialized as a zero-terminated
sequence of 8 bit integers.

A zero character (0x00) marks the end of the string, however, if the string data meets
the end of the packet (or child packet) payload, the terminator is not required. This
means that packets whose payload consists of a string, do not need to include a zero
string terminator and their payload length will be the byte length of the encoded
string exactly.

UTF-8 encoding is required for all strings present in the packet payload. This means
that 7-bit characters may be passed as-is, while extended characters are encoded
with multi-byte sequences.

All applications must be able to parse UTF-8 encoded strings, however, it is up to the
individual application whether to store the string in Unicode, or convert it to the
local code page for processing. In situations where a packet must be processed 'and'
forwarded, the original packet must be forwarded rather than a regenerated version.
This ensures that both locally unsupported encodings and packet extensions are
preserved.

Applications should never send ANSI strings directly if they contain extended
characters with the MSB set. These should be encoded with UTF-8. If this is not
done, the decoding process may fail and the packet will be discarded or contain
bogus information.

Datatypes

2005-03-27T16:52:23Z

Kath: /* Network/Node Addresses */

== Introduction ==

The format of a packet payload is defined by the packet type and can consist of any
binary data; however, there are a number of conventions in place for serializing
common datatypes.

== Multi-Byte Integers ==

Multi-byte integers are serialized in the byte-order of the topmost packet. Little endian
is the default byte-order; however, big-endian byte order can be selected for
those who want it.

== Network/Node Addresses ==

A network or node address consists of a physical address and a port number, and are
of variable length, depending on the address family.
In IPv4, a network/node address is six bytes long: 4 bytes for an IP address and 2
bytes for a port number as follows:

<pre>
<nowiki>
typedef struct
{
BYTE ip[4];
SHORT port;
} IPV4_ENDPOINT;
</nowiki>
</pre>

Note that this is considered an array of 4 8-bit integers (bytes), followed by a 16-bit
integer (short). Byte order does not affect bytes, but it will affect the 16-bit port
number.

IPv6 addresses are longer and are not yet defined within the scope of Gnutella2,
however, applications should be aware that if the node address is not 6 bytes, it is of
a different address family.

== GUIDs ==

Globally unique identifiers (GUIDs) are used to identify nodes on the network. GUIDs
are serialized as an array of 16 bytes.

== Strings ==

Strings are encoded with UTF-8 encoding and serialized as a zero-terminated
sequence of 8 bit integers.

A zero character (0x00) marks the end of the string, however if the string data meets
the end of the packet (or child packet) payload, the terminator is not required. This
means that packets whose payload consists of a string do not need to include a zero
string terminator and their payload length will be the byte length of the encoded
string exactly.

UTF-8 encoding is required for all strings present in the packet payload. This means
that 7-bit characters may be passed as-is, while extended characters are encoded
with multi-byte sequences.

All applications must be able to parse UTF-8 encoded strings, however it is up to the
individual application whether to store the string in Unicode or convert it to the a
local code page for processing. In situations where a packet must be processed 'and'
forwarded, the original packet must be forwarded rather than a regenerated version.
This ensures that both locally unsupported encodings and packet extensions are
preserved.

Applications should never send ANSI strings directly if they contain extended
characters with the MSB set. These should be encoded with UTF-8. If this is not
done, the decoding process may fail and the packet will be discarded or contain
bogus information.

Datatypes

2005-03-27T16:50:52Z

Kath: /* Multi-Byte Integers */

== Introduction ==

The format of a packet payload is defined by the packet type and can consist of any
binary data; however, there are a number of conventions in place for serializing
common datatypes.

== Multi-Byte Integers ==

Multi-byte integers are serialized in the byte-order of the topmost packet. Little endian
is the default byte-order; however, big-endian byte order can be selected for
those who want it.

== Network/Node Addresses ==

A network or node address consists of a physical address and a port number, and are
of variable length depending on the address family.
In IPv4, a network/node address is six bytes long: 4 bytes for an IP address and 2
bytes for a port number as follows:

<pre>
<nowiki>
typedef struct
{
BYTE ip[4];
SHORT port;
} IPV4_ENDPOINT;
</nowiki>
</pre>

Note that this is considered an array of 4 8-bit integers (bytes), followed by a 16-bit
integer (short). Byte order does not affect bytes, but it will affect the 16-bit port
number.

IPv6 addresses are longer and are not yet defined within the scope of Gnutella2,
however applications should be aware that if the node address is not 6 bytes it is of
a different address family.

== GUIDs ==

Globally unique identifiers (GUIDs) are used to identify nodes on the network. GUIDs
are serialized as an array of 16 bytes.

== Strings ==

Strings are encoded with UTF-8 encoding and serialized as a zero-terminated
sequence of 8 bit integers.

A zero character (0x00) marks the end of the string, however if the string data meets
the end of the packet (or child packet) payload, the terminator is not required. This
means that packets whose payload consists of a string do not need to include a zero
string terminator and their payload length will be the byte length of the encoded
string exactly.

UTF-8 encoding is required for all strings present in the packet payload. This means
that 7-bit characters may be passed as-is, while extended characters are encoded
with multi-byte sequences.

All applications must be able to parse UTF-8 encoded strings, however it is up to the
individual application whether to store the string in Unicode or convert it to the a
local code page for processing. In situations where a packet must be processed 'and'
forwarded, the original packet must be forwarded rather than a regenerated version.
This ensures that both locally unsupported encodings and packet extensions are
preserved.

Applications should never send ANSI strings directly if they contain extended
characters with the MSB set. These should be encoded with UTF-8. If this is not
done, the decoding process may fail and the packet will be discarded or contain
bogus information.

Datatypes

2005-03-27T16:50:14Z

Kath: /* Introduction */

== Introduction ==

The format of a packet payload is defined by the packet type and can consist of any
binary data; however, there are a number of conventions in place for serializing
common datatypes.

== Multi-Byte Integers ==

Multi-byte integers are serialized in the byte-order of the topmost packet. Little endian
is the default byte-order; however big-endian byte order can be selected for
those who want it.

== Network/Node Addresses ==

A network or node address consists of a physical address and a port number, and are
of variable length depending on the address family.
In IPv4, a network/node address is six bytes long: 4 bytes for an IP address and 2
bytes for a port number as follows:

<pre>
<nowiki>
typedef struct
{
BYTE ip[4];
SHORT port;
} IPV4_ENDPOINT;
</nowiki>
</pre>

Note that this is considered an array of 4 8-bit integers (bytes), followed by a 16-bit
integer (short). Byte order does not affect bytes, but it will affect the 16-bit port
number.

IPv6 addresses are longer and are not yet defined within the scope of Gnutella2,
however applications should be aware that if the node address is not 6 bytes it is of
a different address family.

== GUIDs ==

Globally unique identifiers (GUIDs) are used to identify nodes on the network. GUIDs
are serialized as an array of 16 bytes.

== Strings ==

Strings are encoded with UTF-8 encoding and serialized as a zero-terminated
sequence of 8 bit integers.

A zero character (0x00) marks the end of the string, however if the string data meets
the end of the packet (or child packet) payload, the terminator is not required. This
means that packets whose payload consists of a string do not need to include a zero
string terminator and their payload length will be the byte length of the encoded
string exactly.

UTF-8 encoding is required for all strings present in the packet payload. This means
that 7-bit characters may be passed as-is, while extended characters are encoded
with multi-byte sequences.

All applications must be able to parse UTF-8 encoded strings, however it is up to the
individual application whether to store the string in Unicode or convert it to the a
local code page for processing. In situations where a packet must be processed 'and'
forwarded, the original packet must be forwarded rather than a regenerated version.
This ensures that both locally unsupported encodings and packet extensions are
preserved.

Applications should never send ANSI strings directly if they contain extended
characters with the MSB set. These should be encoded with UTF-8. If this is not
done, the decoding process may fail and the packet will be discarded or contain
bogus information.

Packet Structure

2005-03-27T16:48:55Z

Kath: /* Introduction */

== Introduction ==

All Gnutella2 communications are represented with Gnutella2 lightweight tree
packets. This applies everywhere from TCP stream communications to reliable UDP
transmissions to HTTP packet exchanges (where protocol data has been negotiated).
Each tree packet may contain meaningful payload data and/or one or more child
packets, allowing complex document structures to be created and extended in a
backward compatible manner.

The concept can be compared to an XML document tree. The "packets" are
elements, which can in turn contain zero or more child elements (packets). The
payload of a packet is like the attributes of an XML element. However, serializing XML
has a lot of overhead due to all the naming, even in a compact binary form. The
Gnutella2 packet structure makes a compromise: it names elements (packets),
allowing them to be globally recognized and understood, without knowledge of their
format - and stores attributes as binary payloads, requiring knowledge of their
content to parse them.

Thus the element (packet or child packet) is the finite unit of comprehension. This
system provides an excellent trade-off between format transparency and
compactness.

== Fictitious Visual Example ==

<pre>
<nowiki>
+ Query Hit Packet
|
|-+ Node ID (standard)
|
|-+ Server Status (standard)
| \-+ Shareaza Server Status (private extension)
|
|-+ Hit Object
| |-+ URN (standard)
| |-+ Descriptive name (standard)
| | \-+ Alternate name list (extension)
| |-+ URL (standard)
| |-+ Priority indicator (private extension)
| | \-+ Digital signature (private)
| |-+ Alternate source summary (standard)
| \-+ Available ranges (standard)
| . \-+ Estimated completion time (private extension)
|
|-+ Selective digital signature (private)
|
\-+ Routing tags
</nowiki>
</pre>

== Contents ==

Each Gnutella2 packet contains:

* Control flags
* A type name meaningful in the namespace of the packet's parent or context
* A length (or implied length)
* Payload data of a format specific to the packet type name and namespace
* Child packets existing in the namespace of this packet

== Namespace Considerations ==

Each packet contains a relative type name of up to 8 bytes in length, which are case
sensitive. The packet type name is meaningful only in the namespace of the packet's
parent, or in the absence of a parent, the context of the packet (e.g. root level TCP
stream).

This means that, for example a packet "A" inside packet "X" is different to a packet
"A" inside packet "Y". Packets are of the same type only if their fully qualified
absolute type names are equal.

As a convention, when discussing packet type names, they will be noted in their
absolute form with a URL style slash (/) separating each level. In the above example,
the first packet is "/X/A" while the second is "/Y/A". It is clear now that the packets
are of different types.

Packet type names can contain from 1 to 8 bytes inclusive, and none of these bytes
may be a null (0). Community approved packets are by convention named with
uppercase characters and digits, for example "PUSH". Private packet types are by
convention named with lowercase characters and digits, prefixed with the vendor
code of the owner, for example "RAZAclr2".

== Framing ==

Packets are encoded with a single leading control byte, followed by one or more
bytes of packet length, followed by one or more bytes of packet name/ID, followed
by zero or more child packets (framed the same way), followed by zero or more
bytes of payload:

<pre>
<nowiki>
| Control | Length_| Name___ | children and/or payload |
</nowiki>
</pre>

All packets can contain a payload only, children and a payload, children only, or
nothing at all. The total length of the packet header (control, length and type name)
cannot exceed 12 bytes and cannot be less than 2 bytes.

=== The Control Byte ===

The control byte is always non-zero. A zero control byte identifies the end of a
stream of packets, and thus has special meaning. It should not be used in root
packet streams (which do not end).
Control bytes have the following format:

<pre>
<nowiki>
Bit 7 Bit 0
| Len_Len | Name_Len - 1 | CF | BE | // |
</nowiki>
</pre>

* Len_Len is the number of bytes in the length field of the packet, which immediately follows the control byte. There are two bits here which means the length field can be up to 3 bytes long. Len_Len? can be zero if the packet has zero length (no children and no payload), in which case there is no need to encode the length.
* Name_Len is the number of bytes in the packet name field MINUS ONE, which follows the packet length field. There are three bits here which means that packet names can be 1 to 8 bytes long inclusive. Because a 0 here equates to one byte of name, unnamed packets are not possible.
* The three least significant bits of the control byte are reserved for flags. They have the following meanings:
* CF is the compound packet flag. If this bit is set, the packet contains one or more child packets. If not set, the packet does not contain any child packets. If the packet is of zero length, this flag is ignored.
* BE is the big-endian packet flag. If set, all multi-byte values encoded in the packet and its children are encoded in big-endian byte order - including the length in the packet header.
* Other bits are reserved.

=== The Length Field ===

The length field immediately follows the control byte, and can be 0 to 3 bytes long.
Length bytes are stored in the byte order of the packet.

The length value includes the payload of this packet AND any child packets in their
entirety. This is obviously needed so that the entire packet can be detected and
acquired from a stream. The length does not include the header (control byte,
length, and name). The length field precedes the name field to allow it to be read
faster from a stream when acquiring packets.

The length field is in the byte order of the root packet.

=== The Type Name Field ===

The type name field immediately follows the length bytes, and can be from 1 to 8
bytes long. Its format is detailed in the previous section entitled "Namespace
Considerations".

=== Child Packets ===

Child packets are only present if the "compound packet bit" is set in the control byte.
If set, there is one or more child packet immediately following the end of the header.
These child packets are included in the total length of their parent (along with the
payload, which follows the child packets after a packet stream terminator).

Child packets are framed exactly the same way, with a control byte, length, name,
children and/or payload. When the compound bit is set and the packet is not of zero
length, the first child packet must exist. Subsequent child packets may also exist, and
are read in sequentially in the same way that they are read from a root packet
stream. The end of the child packet stream is signalled by the presence of a zero
control byte, OR the end of the parent packet's length (in which case there is no
payload). Including a terminating zero control byte when there is no payload is still
valid, but unnecessary.

=== Payload ===

Payload may exist whenever the length field is non-zero. However, if the compound
bit is set, one or more child packets must be read before the payload is reached. If
there is no packet left after the end of the last child, there is no payload.

=== Notes on the Control Byte ===
Note that there are a number of "marker packet types", which have no children or
payload. It is desirable to encode these in as small a space as possible, which means
omitting the length field and setting the len_len bits to zero in the control byte. This
creates a potential conflict, as the control byte itself may be zero if the type name is
one byte long - and as noted above, a zero control byte has special meaning (end of
packet stream). This must be avoided; luckily it is perfectly legal to set the
compound packet flag (CF) on zero length packets, thus producing a non-zero
control byte and the most compact packet possible.

The compound packet bit MUST be checked when decoding every packet. It should
be done in low-level decoding code to avoid accidental omission. Do not assume that
a packet will not have children - it might not now, but no packets are sterile.
Anything could be augmented or extended in some unknown way in the future. If
you are not interested in children, skip them (which is easy, you don't even need to
recurs through their children).

=== Simple Packet Decoder in C ===

<pre>
<nowiki>
BYTE nInput = ReadNextByte();

if ( nInput == 0 ) return S_NO_MORE_CHILDREN;

BYTE nLenLen = ( nInput & 0xC0 ) >> 6;
BYTE nTypeLen = ( nInput & 0x38 ) >> 3;
BYTE nFlags = ( nInput & 0x07 );

BOOL bBigEndian = ( nFlags & 0x02 ) ? TRUE : FALSE;
BOOL bIsCompound = ( nFlags & 0x04 ) ? TRUE : FALSE;

ASSERT( ! bBigEndian );

DWORD nPacketLength = 0;

ReadBytes( (BYTE*)&nPacketLength, nLenLen );

CHAR szType[9];
ReadBytes( (BYTE*)szType, nTypeLen + 1 );
szType[ nTypeLen + 1 ] = 0;
</nowiki>
</pre>

UDP Transceiver

2005-03-27T16:43:22Z

Kath: /* Performance Considerations */

== Introduction ==

The User Datagram Protocol (UDP) is invaluable in peer to peer systems because it
provides a relatively low-cost (low-overhead) method of sending short, irregular
messages to a very large number of peers on demand. Establishing a TCP stream
connection to a peer simply to deliver a single packet of information is wasteful in
data volume and time for the peers involved and state-aware network devices along
the route, for example, network address translation facilities. When dealing with a
large number of peers quickly, these costs become unbearable. UDP provides a
solution and makes this kind of interaction possible.

However, the delivery of UDP packets is not reliable: packets may be lost en-route for
a number of reasons. Often this behaviour is desirable, for example when the
destination node's connection is highly congested, UDP packets are likely to be
discarded. If the content was not critical, this loss is appropriate as the host's
resources are effectively unavailable. In other scenarios involving critical payloads,
UDP's lack of reliability is a problem: either the sender needs to make sure the
receiver gets the payload, or it needs to know definitively that the receiver was
unavailable.

The Gnutella2 network solves this problem by implementing a selectively engaged
reliability layer on top of the basic UDP protocol. This reliability layer shares some
common functionality with TCP, but importantly does not provide any connection
state and thus retains the efficiency originally sought in UDP.

This allows Gnutella2 to select the most optimal communication medium for each
and every transmission it needs to perform:

* If a significant volume of data is to be exchanged, or subsequent data will be exchanged with the same destination, a TCP connection is established
* If a small volume of important data is to be exchanged in a once-off operation or irregularly, reliable UDP is used
* If a small volume of unimportant data is to be exchanged in a once-off operation or irregularly, unreliable UDP is used

== UDP ==

Gnutella2 semi-reliable communication is transported using the UDP protocol. The
port number for receiving UDP is the same as the port number listening for TCP
connections.

== Encoding ==

Implementing an additional reliable protocol within UDP requires a small control
header before the payload itself. This header is put to good use:

A small signature identifies the packet as a Gnutella2 semi-reliable UDP datagram.
This allows the same port to be used for receiving UDP traffic for other protocols if
desired, and offers some simple protection against random, unexpected traffic.
A content code identifies the payload as a Gnutella2 packet stream, allowing future
protocols to be added within the same reliability layer if desired.
Flags allow additional attributes to be specified, such as inline stateless compression
of the payload (which is a required feature).
The header has a fixed size of 8 bytes, and is represented by the following C
structure:

<pre>
<nowiki>
#pragma pack(1)
typedef struct
{
CHAR szTag3;
BYTE nFlags;
WORD nSequence;
BYTE nPart;
BYTE nCount;
} GND_HEADER;
</nowiki>
</pre>

The members of the structure are detailed below:

* '''szTag''' - contains a three byte encoding protocol identifier, in this case "GND" for "GNutella Datagram". If this signature is not present the packet should not be decoded as a Gnutella2 reliability layer transmission.
* '''nFlags''' - contains flags which modify the content of the packet. The low-order nibble is reserved for critical flags: if one of these bits is set but the decoding software does not understand the meaning, the packet must be discarded. The high-order nibble is reserved for non-critical flags: when set these bits may be interpreted, but an inability to interpret a bit does not cause the packet to be discarded. Currently defined flags are:
:* '''0x01''' - Deflate
:: When the deflate bit is set, the entire payload is compressed with the deflate algorithm. The compression method used is the Deflate Compression Data Format (RFC 1951). On top of this compression a ZLIB ‘wrapper’ is applied (RFC 1950, ZLIB Compressed Data Format). The ZLIB wrapper ensures packet integrity, among other things. Note that the entire payload must be reassembled in the correct order before it can be deflated if the packet was fragmented. Fragments are not compressed separately!
:* '''0x02''' - Acknowledge Me
:: When the acknowledge me bit is set, the sender is expecting an acknowledgement for this packet.

* '''nSequence''' - contains the sequence number of the packet. This sequence number is unique to the sending host only. It is not unique to the pair of the sending host and receiving host as in TCP, as there is no concept of connection state. Sequence numbers on consecutive packets need not be increasing (although that is convenient) â€“ they must only be different. If a packet is fragmented, all of its fragments will have the same sequence number. Byte order is unimportant here.
* '''nPart''' - contains the fragment part number (1 <= nPart <= nCount)
* '''nCount''' - contains the number of fragment parts in this packet. On a transmission, this value will be non-zero (all packets must have at least one fragment). If nCount is zero, this is an acknowledgement (see below).

== Fragmentation ==

Large packets must be fragmented before they can be sent through most network
interfaces. Different network media have different MTUs, and it is difficult to predict
what the lowest common size will be. Fragmentation and reassembly is performed by
the existing Internet protocols, however, there are two important reasons why the
reliability layer performs its own fragmentation:

* Sockets implementations specify a maximum datagram size. This is adequate for the vast majority of transmissions, but it is desirable to have the transparent ability to send larger packets without worrying about the host implementation.
* When the Internet protocols fragment, a packet and one or more fragments are lost, it may decide to discard the whole packet in an unreliable datagram protocol. The Gnutella2 reliability layer can compensate by retransmitting the whole packet, which would then be re-fragmented and each fragment resent - however, this wastes the fragments that were successfully received before. Managing fragmentation natively allows this optimisation.

Each node determines its own MTU, often based on a best guess combined with
information from the host's sockets implementation. Packets exceeding this size are
fragmented into multiple datagrams of the appropriate size. Each datagram has the
same sequence number and the same fragment count (nCount), but a different
fragment number (nPart).

== Transmission Process ==

When a packet is to be transmitted, the network layer must:

* Cache the payload
* Allocate a new locally and temporally unique sequence number
* Derive the appropriate number of fragments
* Queue the fragments for dispatch
* If the fragments do not need to be acknowledged, the packet can be flushed now

The payload will generally be cached for an appropriate timeout period, or until the
data cache becomes full, at which time older payloads can be discarded. Fragments
are dispatched according to the dispatch algorithm of choice, and the sender listens
for acknowledgements. When an acknowledgement is received:

* Lookup the sent packet by sequence number
* Mark the nPart fragment as received and cancel any retransmissions of this part
* If all fragments have been acknowledged, flush this packet from the cache

If a fragment has been transmitted but has not been acknowledged within the
timeout, it should be retransmitted. A finite number of retransmissions are allowed
before the packet as a whole expires, at which time it is assumed that the packet
was not received.

== Receive Process ==

When a new datagram is received, the network layer must:

* If the acknowledge bit was set, send an acknowledge packet for this sequence number and part number, with nCount set to zero (ack)
* Lookup any existing packet by the sending IP and sequence number
* If there is no existing packet, create a new packet entry with the IP, sequence number, fragment count and flags
* If there was an existing packet, make sure it is not marked as done - if so, abort
* Add the transmitted fragment to the (new or old) packet entry
* If the packet now has all fragments, mark it as done and decode it and pass it up to the application layer
* Leave the packet on the cache even if it was finished, in case any parts are retransmitted
* Expire old packets from the receive buffer after a timeout, or if the buffer is full

== Dispatch Algorithm ==

Fragment datagrams need to be dispatched intelligently, to spread the load on
network resources and maximise the chance that the receiver will get the message.
To do this, the dispatch algorithm should take into account several points:

* Prioritize acknowledgements.
* If fragments are waiting to be sent to a number of hosts, do not send to the same host twice in a row. Alternating, or looping through the target hosts achieves the same data rate locally, but spreads out the load over downstream links.
* Do not exceed or approach the capacity of the local network connection. If a host has a 128 kb/s outbound bandwidth, dispatching 32 KB of data in one second will likely cause massive packet loss, leading to a retransmission.
* After considering the above points, prefer fragments that were queued recently to older packets. A LIFO or stack type approach means that even if a transmitter is becoming backed up, some fragments will get there on time, while others will be delayed. A FIFO approach would mean that a backed up host delivers every fragment late.

== Parameters ==
The recommended parameters for the reliability layer are as follows:

<pre>
<nowiki>
MTU = 500 bytes
Transmit Retransmit Interval = 10 seconds
Transmit Packet Timeout / Expire = 26 seconds (allows for two retransmissions before expiry)
Receive Packet Expiry = 30 seconds (allows 10 seconds beyond final retransmission)
</nowiki>
</pre>

== Performance Considerations ==

Relatively low-level network implementations such as this are reasonably
complicated, but must operate fast. It is desirable to avoid runtime memory
allocations in network code as much as possible, and particularly at this level.

It should be noted that in almost all cases, transmissions to "untested" nodes are
single fragment. Replies on the other hand are often larger, and may be deflated in
many fragments. This is optimal, because attempting to contact a node which may be
unavailable, involves a retransmission of only a single fragment.

Flow control is an important topic, however, it is handled at a higher layer. The UDP
reliability layer is only responsible for guaranteeing delivery of selected datagrams.

Only critical transmissions whose reception cannot otherwise be inferred, should have
the acknowledge request bit set.

UDP Transceiver

2005-03-27T16:41:24Z

Kath: /* Dispatch Algorithm */

== Introduction ==

The User Datagram Protocol (UDP) is invaluable in peer to peer systems because it
provides a relatively low-cost (low-overhead) method of sending short, irregular
messages to a very large number of peers on demand. Establishing a TCP stream
connection to a peer simply to deliver a single packet of information is wasteful in
data volume and time for the peers involved and state-aware network devices along
the route, for example, network address translation facilities. When dealing with a
large number of peers quickly, these costs become unbearable. UDP provides a
solution and makes this kind of interaction possible.

However, the delivery of UDP packets is not reliable: packets may be lost en-route for
a number of reasons. Often this behaviour is desirable, for example when the
destination node's connection is highly congested, UDP packets are likely to be
discarded. If the content was not critical, this loss is appropriate as the host's
resources are effectively unavailable. In other scenarios involving critical payloads,
UDP's lack of reliability is a problem: either the sender needs to make sure the
receiver gets the payload, or it needs to know definitively that the receiver was
unavailable.

The Gnutella2 network solves this problem by implementing a selectively engaged
reliability layer on top of the basic UDP protocol. This reliability layer shares some
common functionality with TCP, but importantly does not provide any connection
state and thus retains the efficiency originally sought in UDP.

This allows Gnutella2 to select the most optimal communication medium for each
and every transmission it needs to perform:

* If a significant volume of data is to be exchanged, or subsequent data will be exchanged with the same destination, a TCP connection is established
* If a small volume of important data is to be exchanged in a once-off operation or irregularly, reliable UDP is used
* If a small volume of unimportant data is to be exchanged in a once-off operation or irregularly, unreliable UDP is used

== UDP ==

Gnutella2 semi-reliable communication is transported using the UDP protocol. The
port number for receiving UDP is the same as the port number listening for TCP
connections.

== Encoding ==

Implementing an additional reliable protocol within UDP requires a small control
header before the payload itself. This header is put to good use:

A small signature identifies the packet as a Gnutella2 semi-reliable UDP datagram.
This allows the same port to be used for receiving UDP traffic for other protocols if
desired, and offers some simple protection against random, unexpected traffic.
A content code identifies the payload as a Gnutella2 packet stream, allowing future
protocols to be added within the same reliability layer if desired.
Flags allow additional attributes to be specified, such as inline stateless compression
of the payload (which is a required feature).
The header has a fixed size of 8 bytes, and is represented by the following C
structure:

<pre>
<nowiki>
#pragma pack(1)
typedef struct
{
CHAR szTag3;
BYTE nFlags;
WORD nSequence;
BYTE nPart;
BYTE nCount;
} GND_HEADER;
</nowiki>
</pre>

The members of the structure are detailed below:

* '''szTag''' - contains a three byte encoding protocol identifier, in this case "GND" for "GNutella Datagram". If this signature is not present the packet should not be decoded as a Gnutella2 reliability layer transmission.
* '''nFlags''' - contains flags which modify the content of the packet. The low-order nibble is reserved for critical flags: if one of these bits is set but the decoding software does not understand the meaning, the packet must be discarded. The high-order nibble is reserved for non-critical flags: when set these bits may be interpreted, but an inability to interpret a bit does not cause the packet to be discarded. Currently defined flags are:
:* '''0x01''' - Deflate
:: When the deflate bit is set, the entire payload is compressed with the deflate algorithm. The compression method used is the Deflate Compression Data Format (RFC 1951). On top of this compression a ZLIB ‘wrapper’ is applied (RFC 1950, ZLIB Compressed Data Format). The ZLIB wrapper ensures packet integrity, among other things. Note that the entire payload must be reassembled in the correct order before it can be deflated if the packet was fragmented. Fragments are not compressed separately!
:* '''0x02''' - Acknowledge Me
:: When the acknowledge me bit is set, the sender is expecting an acknowledgement for this packet.

* '''nSequence''' - contains the sequence number of the packet. This sequence number is unique to the sending host only. It is not unique to the pair of the sending host and receiving host as in TCP, as there is no concept of connection state. Sequence numbers on consecutive packets need not be increasing (although that is convenient) â€“ they must only be different. If a packet is fragmented, all of its fragments will have the same sequence number. Byte order is unimportant here.
* '''nPart''' - contains the fragment part number (1 <= nPart <= nCount)
* '''nCount''' - contains the number of fragment parts in this packet. On a transmission, this value will be non-zero (all packets must have at least one fragment). If nCount is zero, this is an acknowledgement (see below).

== Fragmentation ==

Large packets must be fragmented before they can be sent through most network
interfaces. Different network media have different MTUs, and it is difficult to predict
what the lowest common size will be. Fragmentation and reassembly is performed by
the existing Internet protocols, however, there are two important reasons why the
reliability layer performs its own fragmentation:

* Sockets implementations specify a maximum datagram size. This is adequate for the vast majority of transmissions, but it is desirable to have the transparent ability to send larger packets without worrying about the host implementation.
* When the Internet protocols fragment, a packet and one or more fragments are lost, it may decide to discard the whole packet in an unreliable datagram protocol. The Gnutella2 reliability layer can compensate by retransmitting the whole packet, which would then be re-fragmented and each fragment resent - however, this wastes the fragments that were successfully received before. Managing fragmentation natively allows this optimisation.

Each node determines its own MTU, often based on a best guess combined with
information from the host's sockets implementation. Packets exceeding this size are
fragmented into multiple datagrams of the appropriate size. Each datagram has the
same sequence number and the same fragment count (nCount), but a different
fragment number (nPart).

== Transmission Process ==

When a packet is to be transmitted, the network layer must:

* Cache the payload
* Allocate a new locally and temporally unique sequence number
* Derive the appropriate number of fragments
* Queue the fragments for dispatch
* If the fragments do not need to be acknowledged, the packet can be flushed now

The payload will generally be cached for an appropriate timeout period, or until the
data cache becomes full, at which time older payloads can be discarded. Fragments
are dispatched according to the dispatch algorithm of choice, and the sender listens
for acknowledgements. When an acknowledgement is received:

* Lookup the sent packet by sequence number
* Mark the nPart fragment as received and cancel any retransmissions of this part
* If all fragments have been acknowledged, flush this packet from the cache

If a fragment has been transmitted but has not been acknowledged within the
timeout, it should be retransmitted. A finite number of retransmissions are allowed
before the packet as a whole expires, at which time it is assumed that the packet
was not received.

== Receive Process ==

When a new datagram is received, the network layer must:

* If the acknowledge bit was set, send an acknowledge packet for this sequence number and part number, with nCount set to zero (ack)
* Lookup any existing packet by the sending IP and sequence number
* If there is no existing packet, create a new packet entry with the IP, sequence number, fragment count and flags
* If there was an existing packet, make sure it is not marked as done - if so, abort
* Add the transmitted fragment to the (new or old) packet entry
* If the packet now has all fragments, mark it as done and decode it and pass it up to the application layer
* Leave the packet on the cache even if it was finished, in case any parts are retransmitted
* Expire old packets from the receive buffer after a timeout, or if the buffer is full

== Dispatch Algorithm ==

Fragment datagrams need to be dispatched intelligently, to spread the load on
network resources and maximise the chance that the receiver will get the message.
To do this, the dispatch algorithm should take into account several points:

* Prioritize acknowledgements.
* If fragments are waiting to be sent to a number of hosts, do not send to the same host twice in a row. Alternating, or looping through the target hosts achieves the same data rate locally, but spreads out the load over downstream links.
* Do not exceed or approach the capacity of the local network connection. If a host has a 128 kb/s outbound bandwidth, dispatching 32 KB of data in one second will likely cause massive packet loss, leading to a retransmission.
* After considering the above points, prefer fragments that were queued recently to older packets. A LIFO or stack type approach means that even if a transmitter is becoming backed up, some fragments will get there on time, while others will be delayed. A FIFO approach would mean that a backed up host delivers every fragment late.

== Parameters ==
The recommended parameters for the reliability layer are as follows:

<pre>
<nowiki>
MTU = 500 bytes
Transmit Retransmit Interval = 10 seconds
Transmit Packet Timeout / Expire = 26 seconds (allows for two retransmissions before expiry)
Receive Packet Expiry = 30 seconds (allows 10 seconds beyond final retransmission)
</nowiki>
</pre>

== Performance Considerations ==

Relatively low-level network implementations such as this are reasonably
complicated, but must operate fast. It is desirable to avoid runtime memory
allocations in network code as much as possible, and particularly at this level.

It should be noted that in almost all cases, transmissions to "untested" nodes are
single fragment. Replies on the other hand are often larger, and may be deflated in
many fragments. This is optimal because attempting to contact a node which may be
unavailable involves a retransmission of only a single fragment.

Flow control is an important topic, however it is handled at a higher layer. The UDP
reliability layer is only responsible for guaranteeing delivery of selected datagrams.

Only critical transmissions whose reception cannot otherwise be inferred should have
the acknowledge request bit set.

UDP Transceiver

2005-03-27T16:39:14Z

Kath: /* Receive Process */

== Introduction ==

The User Datagram Protocol (UDP) is invaluable in peer to peer systems because it
provides a relatively low-cost (low-overhead) method of sending short, irregular
messages to a very large number of peers on demand. Establishing a TCP stream
connection to a peer simply to deliver a single packet of information is wasteful in
data volume and time for the peers involved and state-aware network devices along
the route, for example, network address translation facilities. When dealing with a
large number of peers quickly, these costs become unbearable. UDP provides a
solution and makes this kind of interaction possible.

However, the delivery of UDP packets is not reliable: packets may be lost en-route for
a number of reasons. Often this behaviour is desirable, for example when the
destination node's connection is highly congested, UDP packets are likely to be
discarded. If the content was not critical, this loss is appropriate as the host's
resources are effectively unavailable. In other scenarios involving critical payloads,
UDP's lack of reliability is a problem: either the sender needs to make sure the
receiver gets the payload, or it needs to know definitively that the receiver was
unavailable.

The Gnutella2 network solves this problem by implementing a selectively engaged
reliability layer on top of the basic UDP protocol. This reliability layer shares some
common functionality with TCP, but importantly does not provide any connection
state and thus retains the efficiency originally sought in UDP.

This allows Gnutella2 to select the most optimal communication medium for each
and every transmission it needs to perform:

* If a significant volume of data is to be exchanged, or subsequent data will be exchanged with the same destination, a TCP connection is established
* If a small volume of important data is to be exchanged in a once-off operation or irregularly, reliable UDP is used
* If a small volume of unimportant data is to be exchanged in a once-off operation or irregularly, unreliable UDP is used

== UDP ==

Gnutella2 semi-reliable communication is transported using the UDP protocol. The
port number for receiving UDP is the same as the port number listening for TCP
connections.

== Encoding ==

Implementing an additional reliable protocol within UDP requires a small control
header before the payload itself. This header is put to good use:

A small signature identifies the packet as a Gnutella2 semi-reliable UDP datagram.
This allows the same port to be used for receiving UDP traffic for other protocols if
desired, and offers some simple protection against random, unexpected traffic.
A content code identifies the payload as a Gnutella2 packet stream, allowing future
protocols to be added within the same reliability layer if desired.
Flags allow additional attributes to be specified, such as inline stateless compression
of the payload (which is a required feature).
The header has a fixed size of 8 bytes, and is represented by the following C
structure:

<pre>
<nowiki>
#pragma pack(1)
typedef struct
{
CHAR szTag3;
BYTE nFlags;
WORD nSequence;
BYTE nPart;
BYTE nCount;
} GND_HEADER;
</nowiki>
</pre>

The members of the structure are detailed below:

* '''szTag''' - contains a three byte encoding protocol identifier, in this case "GND" for "GNutella Datagram". If this signature is not present the packet should not be decoded as a Gnutella2 reliability layer transmission.
* '''nFlags''' - contains flags which modify the content of the packet. The low-order nibble is reserved for critical flags: if one of these bits is set but the decoding software does not understand the meaning, the packet must be discarded. The high-order nibble is reserved for non-critical flags: when set these bits may be interpreted, but an inability to interpret a bit does not cause the packet to be discarded. Currently defined flags are:
:* '''0x01''' - Deflate
:: When the deflate bit is set, the entire payload is compressed with the deflate algorithm. The compression method used is the Deflate Compression Data Format (RFC 1951). On top of this compression a ZLIB ‘wrapper’ is applied (RFC 1950, ZLIB Compressed Data Format). The ZLIB wrapper ensures packet integrity, among other things. Note that the entire payload must be reassembled in the correct order before it can be deflated if the packet was fragmented. Fragments are not compressed separately!
:* '''0x02''' - Acknowledge Me
:: When the acknowledge me bit is set, the sender is expecting an acknowledgement for this packet.

* '''nSequence''' - contains the sequence number of the packet. This sequence number is unique to the sending host only. It is not unique to the pair of the sending host and receiving host as in TCP, as there is no concept of connection state. Sequence numbers on consecutive packets need not be increasing (although that is convenient) â€“ they must only be different. If a packet is fragmented, all of its fragments will have the same sequence number. Byte order is unimportant here.
* '''nPart''' - contains the fragment part number (1 <= nPart <= nCount)
* '''nCount''' - contains the number of fragment parts in this packet. On a transmission, this value will be non-zero (all packets must have at least one fragment). If nCount is zero, this is an acknowledgement (see below).

== Fragmentation ==

Large packets must be fragmented before they can be sent through most network
interfaces. Different network media have different MTUs, and it is difficult to predict
what the lowest common size will be. Fragmentation and reassembly is performed by
the existing Internet protocols, however, there are two important reasons why the
reliability layer performs its own fragmentation:

* Sockets implementations specify a maximum datagram size. This is adequate for the vast majority of transmissions, but it is desirable to have the transparent ability to send larger packets without worrying about the host implementation.
* When the Internet protocols fragment, a packet and one or more fragments are lost, it may decide to discard the whole packet in an unreliable datagram protocol. The Gnutella2 reliability layer can compensate by retransmitting the whole packet, which would then be re-fragmented and each fragment resent - however, this wastes the fragments that were successfully received before. Managing fragmentation natively allows this optimisation.

Each node determines its own MTU, often based on a best guess combined with
information from the host's sockets implementation. Packets exceeding this size are
fragmented into multiple datagrams of the appropriate size. Each datagram has the
same sequence number and the same fragment count (nCount), but a different
fragment number (nPart).

== Transmission Process ==

When a packet is to be transmitted, the network layer must:

* Cache the payload
* Allocate a new locally and temporally unique sequence number
* Derive the appropriate number of fragments
* Queue the fragments for dispatch
* If the fragments do not need to be acknowledged, the packet can be flushed now

The payload will generally be cached for an appropriate timeout period, or until the
data cache becomes full, at which time older payloads can be discarded. Fragments
are dispatched according to the dispatch algorithm of choice, and the sender listens
for acknowledgements. When an acknowledgement is received:

* Lookup the sent packet by sequence number
* Mark the nPart fragment as received and cancel any retransmissions of this part
* If all fragments have been acknowledged, flush this packet from the cache

If a fragment has been transmitted but has not been acknowledged within the
timeout, it should be retransmitted. A finite number of retransmissions are allowed
before the packet as a whole expires, at which time it is assumed that the packet
was not received.

== Receive Process ==

When a new datagram is received, the network layer must:

* If the acknowledge bit was set, send an acknowledge packet for this sequence number and part number, with nCount set to zero (ack)
* Lookup any existing packet by the sending IP and sequence number
* If there is no existing packet, create a new packet entry with the IP, sequence number, fragment count and flags
* If there was an existing packet, make sure it is not marked as done - if so, abort
* Add the transmitted fragment to the (new or old) packet entry
* If the packet now has all fragments, mark it as done and decode it and pass it up to the application layer
* Leave the packet on the cache even if it was finished, in case any parts are retransmitted
* Expire old packets from the receive buffer after a timeout, or if the buffer is full

== Dispatch Algorithm ==

Fragment datagrams need to be dispatched intelligently to spread the load on
network resources and maximise the chance that the receiver will get the message.
To do this, the dispatch algorithm should take into account several points:

* Prioritize acknowledgements.
* If fragments are waiting to be sent to a number of hosts, do not send to the same host twice in a row. Alternating or looping through the target hosts achieves the same data rate locally, but spreads out the load over downstream links.
* Do not exceed or approach the capacity of the local network connection. If a host has a 128 kb/s outbound bandwidth, dispatching 32 KB of data in one second will likely cause massive packet loss, leading to a retransmission.
* After considering the above points, prefer fragments that were queued recently to older packets. A LIFO or stack type approach means that even if a transmitter is becoming backed up, some fragments will get there on time while others will be delayed. A FIFO approach would mean that a backed up host delivers every fragment late.

== Parameters ==
The recommended parameters for the reliability layer are as follows:

<pre>
<nowiki>
MTU = 500 bytes
Transmit Retransmit Interval = 10 seconds
Transmit Packet Timeout / Expire = 26 seconds (allows for two retransmissions before expiry)
Receive Packet Expiry = 30 seconds (allows 10 seconds beyond final retransmission)
</nowiki>
</pre>

== Performance Considerations ==

Relatively low-level network implementations such as this are reasonably
complicated, but must operate fast. It is desirable to avoid runtime memory
allocations in network code as much as possible, and particularly at this level.

It should be noted that in almost all cases, transmissions to "untested" nodes are
single fragment. Replies on the other hand are often larger, and may be deflated in
many fragments. This is optimal because attempting to contact a node which may be
unavailable involves a retransmission of only a single fragment.

Flow control is an important topic, however it is handled at a higher layer. The UDP
reliability layer is only responsible for guaranteeing delivery of selected datagrams.

Only critical transmissions whose reception cannot otherwise be inferred should have
the acknowledge request bit set.

UDP Transceiver

2005-03-27T16:07:37Z

Kath: /* Transmission Process */

== Introduction ==

The User Datagram Protocol (UDP) is invaluable in peer to peer systems because it
provides a relatively low-cost (low-overhead) method of sending short, irregular
messages to a very large number of peers on demand. Establishing a TCP stream
connection to a peer simply to deliver a single packet of information is wasteful in
data volume and time for the peers involved and state-aware network devices along
the route, for example, network address translation facilities. When dealing with a
large number of peers quickly, these costs become unbearable. UDP provides a
solution and makes this kind of interaction possible.

However, the delivery of UDP packets is not reliable: packets may be lost en-route for
a number of reasons. Often this behaviour is desirable, for example when the
destination node's connection is highly congested, UDP packets are likely to be
discarded. If the content was not critical, this loss is appropriate as the host's
resources are effectively unavailable. In other scenarios involving critical payloads,
UDP's lack of reliability is a problem: either the sender needs to make sure the
receiver gets the payload, or it needs to know definitively that the receiver was
unavailable.

The Gnutella2 network solves this problem by implementing a selectively engaged
reliability layer on top of the basic UDP protocol. This reliability layer shares some
common functionality with TCP, but importantly does not provide any connection
state and thus retains the efficiency originally sought in UDP.

This allows Gnutella2 to select the most optimal communication medium for each
and every transmission it needs to perform:

* If a significant volume of data is to be exchanged, or subsequent data will be exchanged with the same destination, a TCP connection is established
* If a small volume of important data is to be exchanged in a once-off operation or irregularly, reliable UDP is used
* If a small volume of unimportant data is to be exchanged in a once-off operation or irregularly, unreliable UDP is used

== UDP ==

Gnutella2 semi-reliable communication is transported using the UDP protocol. The
port number for receiving UDP is the same as the port number listening for TCP
connections.

== Encoding ==

Implementing an additional reliable protocol within UDP requires a small control
header before the payload itself. This header is put to good use:

A small signature identifies the packet as a Gnutella2 semi-reliable UDP datagram.
This allows the same port to be used for receiving UDP traffic for other protocols if
desired, and offers some simple protection against random, unexpected traffic.
A content code identifies the payload as a Gnutella2 packet stream, allowing future
protocols to be added within the same reliability layer if desired.
Flags allow additional attributes to be specified, such as inline stateless compression
of the payload (which is a required feature).
The header has a fixed size of 8 bytes, and is represented by the following C
structure:

<pre>
<nowiki>
#pragma pack(1)
typedef struct
{
CHAR szTag3;
BYTE nFlags;
WORD nSequence;
BYTE nPart;
BYTE nCount;
} GND_HEADER;
</nowiki>
</pre>

The members of the structure are detailed below:

* '''szTag''' - contains a three byte encoding protocol identifier, in this case "GND" for "GNutella Datagram". If this signature is not present the packet should not be decoded as a Gnutella2 reliability layer transmission.
* '''nFlags''' - contains flags which modify the content of the packet. The low-order nibble is reserved for critical flags: if one of these bits is set but the decoding software does not understand the meaning, the packet must be discarded. The high-order nibble is reserved for non-critical flags: when set these bits may be interpreted, but an inability to interpret a bit does not cause the packet to be discarded. Currently defined flags are:
:* '''0x01''' - Deflate
:: When the deflate bit is set, the entire payload is compressed with the deflate algorithm. The compression method used is the Deflate Compression Data Format (RFC 1951). On top of this compression a ZLIB ‘wrapper’ is applied (RFC 1950, ZLIB Compressed Data Format). The ZLIB wrapper ensures packet integrity, among other things. Note that the entire payload must be reassembled in the correct order before it can be deflated if the packet was fragmented. Fragments are not compressed separately!
:* '''0x02''' - Acknowledge Me
:: When the acknowledge me bit is set, the sender is expecting an acknowledgement for this packet.

* '''nSequence''' - contains the sequence number of the packet. This sequence number is unique to the sending host only. It is not unique to the pair of the sending host and receiving host as in TCP, as there is no concept of connection state. Sequence numbers on consecutive packets need not be increasing (although that is convenient) â€“ they must only be different. If a packet is fragmented, all of its fragments will have the same sequence number. Byte order is unimportant here.
* '''nPart''' - contains the fragment part number (1 <= nPart <= nCount)
* '''nCount''' - contains the number of fragment parts in this packet. On a transmission, this value will be non-zero (all packets must have at least one fragment). If nCount is zero, this is an acknowledgement (see below).

== Fragmentation ==

Large packets must be fragmented before they can be sent through most network
interfaces. Different network media have different MTUs, and it is difficult to predict
what the lowest common size will be. Fragmentation and reassembly is performed by
the existing Internet protocols, however, there are two important reasons why the
reliability layer performs its own fragmentation:

* Sockets implementations specify a maximum datagram size. This is adequate for the vast majority of transmissions, but it is desirable to have the transparent ability to send larger packets without worrying about the host implementation.
* When the Internet protocols fragment, a packet and one or more fragments are lost, it may decide to discard the whole packet in an unreliable datagram protocol. The Gnutella2 reliability layer can compensate by retransmitting the whole packet, which would then be re-fragmented and each fragment resent - however, this wastes the fragments that were successfully received before. Managing fragmentation natively allows this optimisation.

Each node determines its own MTU, often based on a best guess combined with
information from the host's sockets implementation. Packets exceeding this size are
fragmented into multiple datagrams of the appropriate size. Each datagram has the
same sequence number and the same fragment count (nCount), but a different
fragment number (nPart).

== Transmission Process ==

When a packet is to be transmitted, the network layer must:

* Cache the payload
* Allocate a new locally and temporally unique sequence number
* Derive the appropriate number of fragments
* Queue the fragments for dispatch
* If the fragments do not need to be acknowledged, the packet can be flushed now

The payload will generally be cached for an appropriate timeout period, or until the
data cache becomes full, at which time older payloads can be discarded. Fragments
are dispatched according to the dispatch algorithm of choice, and the sender listens
for acknowledgements. When an acknowledgement is received:

* Lookup the sent packet by sequence number
* Mark the nPart fragment as received and cancel any retransmissions of this part
* If all fragments have been acknowledged, flush this packet from the cache

If a fragment has been transmitted but has not been acknowledged within the
timeout, it should be retransmitted. A finite number of retransmissions are allowed
before the packet as a whole expires, at which time it is assumed that the packet
was not received.

== Receive Process ==

When a new datagram is received, the network layer must:

* If the acknowledge bit was set, send an acknowledge packet for this sequence number and part number, with nCount set to zero (ack)
* Lookup any existing packet by the sending IP and sequence number
* If there is no existing packet, create a new packet entry with the IP, sequence number, fragment count and flags
* If there was an existing packet, make sure it is not marked as done - if so, abort
* Add the transmitted fragment to the (new or old) packet entry
* If the packet now has all fragments, mark it as done and decode it and pass it up to the application layer
* Leave the packet on the cache even if it was finished, in case any parts are retransmitted
* Expire old packets from the receive buffer after a timeout or if the buffer is full

== Dispatch Algorithm ==

Fragment datagrams need to be dispatched intelligently to spread the load on
network resources and maximise the chance that the receiver will get the message.
To do this, the dispatch algorithm should take into account several points:

* Prioritize acknowledgements.
* If fragments are waiting to be sent to a number of hosts, do not send to the same host twice in a row. Alternating or looping through the target hosts achieves the same data rate locally, but spreads out the load over downstream links.
* Do not exceed or approach the capacity of the local network connection. If a host has a 128 kb/s outbound bandwidth, dispatching 32 KB of data in one second will likely cause massive packet loss, leading to a retransmission.
* After considering the above points, prefer fragments that were queued recently to older packets. A LIFO or stack type approach means that even if a transmitter is becoming backed up, some fragments will get there on time while others will be delayed. A FIFO approach would mean that a backed up host delivers every fragment late.

== Parameters ==
The recommended parameters for the reliability layer are as follows:

<pre>
<nowiki>
MTU = 500 bytes
Transmit Retransmit Interval = 10 seconds
Transmit Packet Timeout / Expire = 26 seconds (allows for two retransmissions before expiry)
Receive Packet Expiry = 30 seconds (allows 10 seconds beyond final retransmission)
</nowiki>
</pre>

== Performance Considerations ==

Relatively low-level network implementations such as this are reasonably
complicated, but must operate fast. It is desirable to avoid runtime memory
allocations in network code as much as possible, and particularly at this level.

It should be noted that in almost all cases, transmissions to "untested" nodes are
single fragment. Replies on the other hand are often larger, and may be deflated in
many fragments. This is optimal because attempting to contact a node which may be
unavailable involves a retransmission of only a single fragment.

Flow control is an important topic, however it is handled at a higher layer. The UDP
reliability layer is only responsible for guaranteeing delivery of selected datagrams.

Only critical transmissions whose reception cannot otherwise be inferred should have
the acknowledge request bit set.

UDP Transceiver

2005-03-27T16:05:50Z

Kath: /* Fragmentation */

== Introduction ==

The User Datagram Protocol (UDP) is invaluable in peer to peer systems because it
provides a relatively low-cost (low-overhead) method of sending short, irregular
messages to a very large number of peers on demand. Establishing a TCP stream
connection to a peer simply to deliver a single packet of information is wasteful in
data volume and time for the peers involved and state-aware network devices along
the route, for example, network address translation facilities. When dealing with a
large number of peers quickly, these costs become unbearable. UDP provides a
solution and makes this kind of interaction possible.

However, the delivery of UDP packets is not reliable: packets may be lost en-route for
a number of reasons. Often this behaviour is desirable, for example when the
destination node's connection is highly congested, UDP packets are likely to be
discarded. If the content was not critical, this loss is appropriate as the host's
resources are effectively unavailable. In other scenarios involving critical payloads,
UDP's lack of reliability is a problem: either the sender needs to make sure the
receiver gets the payload, or it needs to know definitively that the receiver was
unavailable.

The Gnutella2 network solves this problem by implementing a selectively engaged
reliability layer on top of the basic UDP protocol. This reliability layer shares some
common functionality with TCP, but importantly does not provide any connection
state and thus retains the efficiency originally sought in UDP.

This allows Gnutella2 to select the most optimal communication medium for each
and every transmission it needs to perform:

* If a significant volume of data is to be exchanged, or subsequent data will be exchanged with the same destination, a TCP connection is established
* If a small volume of important data is to be exchanged in a once-off operation or irregularly, reliable UDP is used
* If a small volume of unimportant data is to be exchanged in a once-off operation or irregularly, unreliable UDP is used

== UDP ==

Gnutella2 semi-reliable communication is transported using the UDP protocol. The
port number for receiving UDP is the same as the port number listening for TCP
connections.

== Encoding ==

Implementing an additional reliable protocol within UDP requires a small control
header before the payload itself. This header is put to good use:

A small signature identifies the packet as a Gnutella2 semi-reliable UDP datagram.
This allows the same port to be used for receiving UDP traffic for other protocols if
desired, and offers some simple protection against random, unexpected traffic.
A content code identifies the payload as a Gnutella2 packet stream, allowing future
protocols to be added within the same reliability layer if desired.
Flags allow additional attributes to be specified, such as inline stateless compression
of the payload (which is a required feature).
The header has a fixed size of 8 bytes, and is represented by the following C
structure:

<pre>
<nowiki>
#pragma pack(1)
typedef struct
{
CHAR szTag3;
BYTE nFlags;
WORD nSequence;
BYTE nPart;
BYTE nCount;
} GND_HEADER;
</nowiki>
</pre>

The members of the structure are detailed below:

* '''szTag''' - contains a three byte encoding protocol identifier, in this case "GND" for "GNutella Datagram". If this signature is not present the packet should not be decoded as a Gnutella2 reliability layer transmission.
* '''nFlags''' - contains flags which modify the content of the packet. The low-order nibble is reserved for critical flags: if one of these bits is set but the decoding software does not understand the meaning, the packet must be discarded. The high-order nibble is reserved for non-critical flags: when set these bits may be interpreted, but an inability to interpret a bit does not cause the packet to be discarded. Currently defined flags are:
:* '''0x01''' - Deflate
:: When the deflate bit is set, the entire payload is compressed with the deflate algorithm. The compression method used is the Deflate Compression Data Format (RFC 1951). On top of this compression a ZLIB ‘wrapper’ is applied (RFC 1950, ZLIB Compressed Data Format). The ZLIB wrapper ensures packet integrity, among other things. Note that the entire payload must be reassembled in the correct order before it can be deflated if the packet was fragmented. Fragments are not compressed separately!
:* '''0x02''' - Acknowledge Me
:: When the acknowledge me bit is set, the sender is expecting an acknowledgement for this packet.

* '''nSequence''' - contains the sequence number of the packet. This sequence number is unique to the sending host only. It is not unique to the pair of the sending host and receiving host as in TCP, as there is no concept of connection state. Sequence numbers on consecutive packets need not be increasing (although that is convenient) â€“ they must only be different. If a packet is fragmented, all of its fragments will have the same sequence number. Byte order is unimportant here.
* '''nPart''' - contains the fragment part number (1 <= nPart <= nCount)
* '''nCount''' - contains the number of fragment parts in this packet. On a transmission, this value will be non-zero (all packets must have at least one fragment). If nCount is zero, this is an acknowledgement (see below).

== Fragmentation ==

Large packets must be fragmented before they can be sent through most network
interfaces. Different network media have different MTUs, and it is difficult to predict
what the lowest common size will be. Fragmentation and reassembly is performed by
the existing Internet protocols, however, there are two important reasons why the
reliability layer performs its own fragmentation:

* Sockets implementations specify a maximum datagram size. This is adequate for the vast majority of transmissions, but it is desirable to have the transparent ability to send larger packets without worrying about the host implementation.
* When the Internet protocols fragment, a packet and one or more fragments are lost, it may decide to discard the whole packet in an unreliable datagram protocol. The Gnutella2 reliability layer can compensate by retransmitting the whole packet, which would then be re-fragmented and each fragment resent - however, this wastes the fragments that were successfully received before. Managing fragmentation natively allows this optimisation.

Each node determines its own MTU, often based on a best guess combined with
information from the host's sockets implementation. Packets exceeding this size are
fragmented into multiple datagrams of the appropriate size. Each datagram has the
same sequence number and the same fragment count (nCount), but a different
fragment number (nPart).

== Transmission Process ==

When a packet is to be transmitted the network layer must:

* Cache the payload
* Allocate a new locally and temporally unique sequence number
* Derive the appropriate number of fragments
* Queue the fragments for dispatch
* If the fragments do not need to be acknowledged, the packet can be flushed now

The payload will generally be cached for an appropriate timeout period, or until the
data cache becomes full at which time older payloads can be discarded. Fragments
are dispatched according to the dispatch algorithm of choice, and the sender listens
for acknowledgements. When an acknowledgement is received:

* Lookup the sent packet by sequence number
* Mark the nPart fragment as received and cancel any retransmissions of this part
* If all fragments have been acknowledged, flush this packet from the cache

If a fragment has been transmitted but has not been acknowledged within the
timeout, it should be retransmitted. A finite number of retransmissions are allowed
before the packet as a whole expires, at which time it is assumed that the packet
was not received.

== Receive Process ==

When a new datagram is received, the network layer must:

* If the acknowledge bit was set, send an acknowledge packet for this sequence number and part number, with nCount set to zero (ack)
* Lookup any existing packet by the sending IP and sequence number
* If there is no existing packet, create a new packet entry with the IP, sequence number, fragment count and flags
* If there was an existing packet, make sure it is not marked as done - if so, abort
* Add the transmitted fragment to the (new or old) packet entry
* If the packet now has all fragments, mark it as done and decode it and pass it up to the application layer
* Leave the packet on the cache even if it was finished, in case any parts are retransmitted
* Expire old packets from the receive buffer after a timeout or if the buffer is full

== Dispatch Algorithm ==

Fragment datagrams need to be dispatched intelligently to spread the load on
network resources and maximise the chance that the receiver will get the message.
To do this, the dispatch algorithm should take into account several points:

* Prioritize acknowledgements.
* If fragments are waiting to be sent to a number of hosts, do not send to the same host twice in a row. Alternating or looping through the target hosts achieves the same data rate locally, but spreads out the load over downstream links.
* Do not exceed or approach the capacity of the local network connection. If a host has a 128 kb/s outbound bandwidth, dispatching 32 KB of data in one second will likely cause massive packet loss, leading to a retransmission.
* After considering the above points, prefer fragments that were queued recently to older packets. A LIFO or stack type approach means that even if a transmitter is becoming backed up, some fragments will get there on time while others will be delayed. A FIFO approach would mean that a backed up host delivers every fragment late.

== Parameters ==
The recommended parameters for the reliability layer are as follows:

<pre>
<nowiki>
MTU = 500 bytes
Transmit Retransmit Interval = 10 seconds
Transmit Packet Timeout / Expire = 26 seconds (allows for two retransmissions before expiry)
Receive Packet Expiry = 30 seconds (allows 10 seconds beyond final retransmission)
</nowiki>
</pre>

== Performance Considerations ==

Relatively low-level network implementations such as this are reasonably
complicated, but must operate fast. It is desirable to avoid runtime memory
allocations in network code as much as possible, and particularly at this level.

It should be noted that in almost all cases, transmissions to "untested" nodes are
single fragment. Replies on the other hand are often larger, and may be deflated in
many fragments. This is optimal because attempting to contact a node which may be
unavailable involves a retransmission of only a single fragment.

Flow control is an important topic, however it is handled at a higher layer. The UDP
reliability layer is only responsible for guaranteeing delivery of selected datagrams.

Only critical transmissions whose reception cannot otherwise be inferred should have
the acknowledge request bit set.

UDP Transceiver

2005-03-27T16:03:23Z

Kath: /* Encoding */

== Introduction ==

The User Datagram Protocol (UDP) is invaluable in peer to peer systems because it
provides a relatively low-cost (low-overhead) method of sending short, irregular
messages to a very large number of peers on demand. Establishing a TCP stream
connection to a peer simply to deliver a single packet of information is wasteful in
data volume and time for the peers involved and state-aware network devices along
the route, for example, network address translation facilities. When dealing with a
large number of peers quickly, these costs become unbearable. UDP provides a
solution and makes this kind of interaction possible.

However, the delivery of UDP packets is not reliable: packets may be lost en-route for
a number of reasons. Often this behaviour is desirable, for example when the
destination node's connection is highly congested, UDP packets are likely to be
discarded. If the content was not critical, this loss is appropriate as the host's
resources are effectively unavailable. In other scenarios involving critical payloads,
UDP's lack of reliability is a problem: either the sender needs to make sure the
receiver gets the payload, or it needs to know definitively that the receiver was
unavailable.

The Gnutella2 network solves this problem by implementing a selectively engaged
reliability layer on top of the basic UDP protocol. This reliability layer shares some
common functionality with TCP, but importantly does not provide any connection
state and thus retains the efficiency originally sought in UDP.

This allows Gnutella2 to select the most optimal communication medium for each
and every transmission it needs to perform:

* If a significant volume of data is to be exchanged, or subsequent data will be exchanged with the same destination, a TCP connection is established
* If a small volume of important data is to be exchanged in a once-off operation or irregularly, reliable UDP is used
* If a small volume of unimportant data is to be exchanged in a once-off operation or irregularly, unreliable UDP is used

== UDP ==

Gnutella2 semi-reliable communication is transported using the UDP protocol. The
port number for receiving UDP is the same as the port number listening for TCP
connections.

== Encoding ==

Implementing an additional reliable protocol within UDP requires a small control
header before the payload itself. This header is put to good use:

A small signature identifies the packet as a Gnutella2 semi-reliable UDP datagram.
This allows the same port to be used for receiving UDP traffic for other protocols if
desired, and offers some simple protection against random, unexpected traffic.
A content code identifies the payload as a Gnutella2 packet stream, allowing future
protocols to be added within the same reliability layer if desired.
Flags allow additional attributes to be specified, such as inline stateless compression
of the payload (which is a required feature).
The header has a fixed size of 8 bytes, and is represented by the following C
structure:

<pre>
<nowiki>
#pragma pack(1)
typedef struct
{
CHAR szTag3;
BYTE nFlags;
WORD nSequence;
BYTE nPart;
BYTE nCount;
} GND_HEADER;
</nowiki>
</pre>

The members of the structure are detailed below:

* '''szTag''' - contains a three byte encoding protocol identifier, in this case "GND" for "GNutella Datagram". If this signature is not present the packet should not be decoded as a Gnutella2 reliability layer transmission.
* '''nFlags''' - contains flags which modify the content of the packet. The low-order nibble is reserved for critical flags: if one of these bits is set but the decoding software does not understand the meaning, the packet must be discarded. The high-order nibble is reserved for non-critical flags: when set these bits may be interpreted, but an inability to interpret a bit does not cause the packet to be discarded. Currently defined flags are:
:* '''0x01''' - Deflate
:: When the deflate bit is set, the entire payload is compressed with the deflate algorithm. The compression method used is the Deflate Compression Data Format (RFC 1951). On top of this compression a ZLIB ‘wrapper’ is applied (RFC 1950, ZLIB Compressed Data Format). The ZLIB wrapper ensures packet integrity, among other things. Note that the entire payload must be reassembled in the correct order before it can be deflated if the packet was fragmented. Fragments are not compressed separately!
:* '''0x02''' - Acknowledge Me
:: When the acknowledge me bit is set, the sender is expecting an acknowledgement for this packet.

* '''nSequence''' - contains the sequence number of the packet. This sequence number is unique to the sending host only. It is not unique to the pair of the sending host and receiving host as in TCP, as there is no concept of connection state. Sequence numbers on consecutive packets need not be increasing (although that is convenient) â€“ they must only be different. If a packet is fragmented, all of its fragments will have the same sequence number. Byte order is unimportant here.
* '''nPart''' - contains the fragment part number (1 <= nPart <= nCount)
* '''nCount''' - contains the number of fragment parts in this packet. On a transmission, this value will be non-zero (all packets must have at least one fragment). If nCount is zero, this is an acknowledgement (see below).

== Fragmentation ==

Large packets must be fragmented before they can be sent through most network
interfaces. Different network media have different MTUs, and it is difficult to predict
what the lowest common size will be. Fragmentation and reassembly is performed by
the existing Internet protocols, however there are two important reasons why the
reliability layer performs its own fragmentation:

* Sockets implementations specify a maximum datagram size. This is adequate for the vast majority of transmissions, but it is desirable to have the transparent ability to send larger packets without worrying about the host implementation.
* When the Internet protocols fragment a packet and one or more fragments are lost, it may decide to discard the whole packet in an unreliable datagram protocol. The Gnutella2 reliability layer can compensate by retransmitting the whole packet, which would then be re-fragmented and each fragment resent - however this wastes the fragments that were successfully received before. Managing fragmentation natively allows this optimisation.

Each node determines its own MTU, often based on a best guess combined with
information from the host's sockets implementation. Packets exceeding this size are
fragmented into multiple datagrams of the appropriate size. Each datagram has the
same sequence number and the same fragment count (nCount), but a different
fragment number (nPart).

== Transmission Process ==

When a packet is to be transmitted the network layer must:

* Cache the payload
* Allocate a new locally and temporally unique sequence number
* Derive the appropriate number of fragments
* Queue the fragments for dispatch
* If the fragments do not need to be acknowledged, the packet can be flushed now

The payload will generally be cached for an appropriate timeout period, or until the
data cache becomes full at which time older payloads can be discarded. Fragments
are dispatched according to the dispatch algorithm of choice, and the sender listens
for acknowledgements. When an acknowledgement is received:

* Lookup the sent packet by sequence number
* Mark the nPart fragment as received and cancel any retransmissions of this part
* If all fragments have been acknowledged, flush this packet from the cache

If a fragment has been transmitted but has not been acknowledged within the
timeout, it should be retransmitted. A finite number of retransmissions are allowed
before the packet as a whole expires, at which time it is assumed that the packet
was not received.

== Receive Process ==

When a new datagram is received, the network layer must:

* If the acknowledge bit was set, send an acknowledge packet for this sequence number and part number, with nCount set to zero (ack)
* Lookup any existing packet by the sending IP and sequence number
* If there is no existing packet, create a new packet entry with the IP, sequence number, fragment count and flags
* If there was an existing packet, make sure it is not marked as done - if so, abort
* Add the transmitted fragment to the (new or old) packet entry
* If the packet now has all fragments, mark it as done and decode it and pass it up to the application layer
* Leave the packet on the cache even if it was finished, in case any parts are retransmitted
* Expire old packets from the receive buffer after a timeout or if the buffer is full

== Dispatch Algorithm ==

Fragment datagrams need to be dispatched intelligently to spread the load on
network resources and maximise the chance that the receiver will get the message.
To do this, the dispatch algorithm should take into account several points:

* Prioritize acknowledgements.
* If fragments are waiting to be sent to a number of hosts, do not send to the same host twice in a row. Alternating or looping through the target hosts achieves the same data rate locally, but spreads out the load over downstream links.
* Do not exceed or approach the capacity of the local network connection. If a host has a 128 kb/s outbound bandwidth, dispatching 32 KB of data in one second will likely cause massive packet loss, leading to a retransmission.
* After considering the above points, prefer fragments that were queued recently to older packets. A LIFO or stack type approach means that even if a transmitter is becoming backed up, some fragments will get there on time while others will be delayed. A FIFO approach would mean that a backed up host delivers every fragment late.

== Parameters ==
The recommended parameters for the reliability layer are as follows:

<pre>
<nowiki>
MTU = 500 bytes
Transmit Retransmit Interval = 10 seconds
Transmit Packet Timeout / Expire = 26 seconds (allows for two retransmissions before expiry)
Receive Packet Expiry = 30 seconds (allows 10 seconds beyond final retransmission)
</nowiki>
</pre>

== Performance Considerations ==

Relatively low-level network implementations such as this are reasonably
complicated, but must operate fast. It is desirable to avoid runtime memory
allocations in network code as much as possible, and particularly at this level.

It should be noted that in almost all cases, transmissions to "untested" nodes are
single fragment. Replies on the other hand are often larger, and may be deflated in
many fragments. This is optimal because attempting to contact a node which may be
unavailable involves a retransmission of only a single fragment.

Flow control is an important topic, however it is handled at a higher layer. The UDP
reliability layer is only responsible for guaranteeing delivery of selected datagrams.

Only critical transmissions whose reception cannot otherwise be inferred should have
the acknowledge request bit set.

UDP Transceiver

2005-03-27T15:50:08Z

Kath: /* Introduction */

== Introduction ==

The User Datagram Protocol (UDP) is invaluable in peer to peer systems because it
provides a relatively low-cost (low-overhead) method of sending short, irregular
messages to a very large number of peers on demand. Establishing a TCP stream
connection to a peer simply to deliver a single packet of information is wasteful in
data volume and time for the peers involved and state-aware network devices along
the route, for example, network address translation facilities. When dealing with a
large number of peers quickly, these costs become unbearable. UDP provides a
solution and makes this kind of interaction possible.

However, the delivery of UDP packets is not reliable: packets may be lost en-route for
a number of reasons. Often this behaviour is desirable, for example when the
destination node's connection is highly congested, UDP packets are likely to be
discarded. If the content was not critical, this loss is appropriate as the host's
resources are effectively unavailable. In other scenarios involving critical payloads,
UDP's lack of reliability is a problem: either the sender needs to make sure the
receiver gets the payload, or it needs to know definitively that the receiver was
unavailable.

The Gnutella2 network solves this problem by implementing a selectively engaged
reliability layer on top of the basic UDP protocol. This reliability layer shares some
common functionality with TCP, but importantly does not provide any connection
state and thus retains the efficiency originally sought in UDP.

This allows Gnutella2 to select the most optimal communication medium for each
and every transmission it needs to perform:

* If a significant volume of data is to be exchanged, or subsequent data will be exchanged with the same destination, a TCP connection is established
* If a small volume of important data is to be exchanged in a once-off operation or irregularly, reliable UDP is used
* If a small volume of unimportant data is to be exchanged in a once-off operation or irregularly, unreliable UDP is used

== UDP ==

Gnutella2 semi-reliable communication is transported using the UDP protocol. The
port number for receiving UDP is the same as the port number listening for TCP
connections.

== Encoding ==

Implementing an additional reliable protocol within UDP requires a small control
header before the payload itself. This header is put to good use:

A small signature identifies the packet as a Gnutella2 semi-reliable UDP datagram.
This allows the same port to be used for receiving UDP traffic for other protocols if
desired, and offers some simple protection against random, unexpected traffic.
A content code identifies the payload as a Gnutella2 packet stream, allowing future
protocols to be added within the same reliability layer if desired.
Flags allow additional attributes to be specified, such as inline stateless compression
of the payload (which is a required feature).
The header has a fixed size of 8 bytes, and is represented by the following C
structure:

<pre>
<nowiki>
#pragma pack(1)
typedef struct
{
CHAR szTag3;
BYTE nFlags;
WORD nSequence;
BYTE nPart;
BYTE nCount;
} GND_HEADER;
</nowiki>
</pre>

The members of the structure are detailed below:

* '''szTag''' - contains a three byte encoding protocol identifier, in this case "GND" for "GNutella Datagram". If this signature is not present the packet should not be decoded as a Gnutella2 reliability layer transmission.
* '''nFlags''' - contains flags which modify the content of the packet. The low-order nibble is reserved for critical flags: if one of these bits is set but the decoding software does not understand the meaning, the packet must be discarded. The high-order nibble is reserved for non-critical flags: when set these bits may be interpreted, but an inability to interpret a bit does not cause the packet to be discarded. Currently defined flags are:
:* '''0x01''' - Deflate
:: When the deflate bit is set, the entire payload is compressed with the deflate algorithm. The compression method used is the Deflate Compression Data Format (RFC 1951). On top of this compression a ZLIB ‘wrapper’ is applied (RFC 1950, ZLIB Compressed Data Format.) The ZLIB wrapper ensures packet integrity, among other things. Note that the entire payload must be reassembled in the correct order before it can be deflated if the packet was fragmented. Fragments are not compressed separately!
:* '''0x02''' - Acknowledge Me
:: When the acknowledge me bit is set, the sender is expecting an acknowledgement for this packet.

* '''nSequence''' - contains the sequence number of the packet. This sequence number is unique to the sending host only. It is not unique to the pair of the sending host and receiving host as in TCP, as there is no concept of connection state. Sequence numbers on consecutive packets need not be increasing (although that is convenient) â€“ they must only be different. If a packet is fragmented, all of its fragments will have the same sequence number. Byte order is unimportant here.
* '''nPart''' - contains the fragment part number (1 <= nPart <= nCount)
* '''nCount''' - contains the number of fragment parts in this packet. On a transmission, this value will be non-zero (all packets must have at least one fragment). If nCount is zero, this is an acknowledgement (see below).

== Fragmentation ==

Large packets must be fragmented before they can be sent through most network
interfaces. Different network media have different MTUs, and it is difficult to predict
what the lowest common size will be. Fragmentation and reassembly is performed by
the existing Internet protocols, however there are two important reasons why the
reliability layer performs its own fragmentation:

* Sockets implementations specify a maximum datagram size. This is adequate for the vast majority of transmissions, but it is desirable to have the transparent ability to send larger packets without worrying about the host implementation.
* When the Internet protocols fragment a packet and one or more fragments are lost, it may decide to discard the whole packet in an unreliable datagram protocol. The Gnutella2 reliability layer can compensate by retransmitting the whole packet, which would then be re-fragmented and each fragment resent - however this wastes the fragments that were successfully received before. Managing fragmentation natively allows this optimisation.

Each node determines its own MTU, often based on a best guess combined with
information from the host's sockets implementation. Packets exceeding this size are
fragmented into multiple datagrams of the appropriate size. Each datagram has the
same sequence number and the same fragment count (nCount), but a different
fragment number (nPart).

== Transmission Process ==

When a packet is to be transmitted the network layer must:

* Cache the payload
* Allocate a new locally and temporally unique sequence number
* Derive the appropriate number of fragments
* Queue the fragments for dispatch
* If the fragments do not need to be acknowledged, the packet can be flushed now

The payload will generally be cached for an appropriate timeout period, or until the
data cache becomes full at which time older payloads can be discarded. Fragments
are dispatched according to the dispatch algorithm of choice, and the sender listens
for acknowledgements. When an acknowledgement is received:

* Lookup the sent packet by sequence number
* Mark the nPart fragment as received and cancel any retransmissions of this part
* If all fragments have been acknowledged, flush this packet from the cache

If a fragment has been transmitted but has not been acknowledged within the
timeout, it should be retransmitted. A finite number of retransmissions are allowed
before the packet as a whole expires, at which time it is assumed that the packet
was not received.

== Receive Process ==

When a new datagram is received, the network layer must:

* If the acknowledge bit was set, send an acknowledge packet for this sequence number and part number, with nCount set to zero (ack)
* Lookup any existing packet by the sending IP and sequence number
* If there is no existing packet, create a new packet entry with the IP, sequence number, fragment count and flags
* If there was an existing packet, make sure it is not marked as done - if so, abort
* Add the transmitted fragment to the (new or old) packet entry
* If the packet now has all fragments, mark it as done and decode it and pass it up to the application layer
* Leave the packet on the cache even if it was finished, in case any parts are retransmitted
* Expire old packets from the receive buffer after a timeout or if the buffer is full

== Dispatch Algorithm ==

Fragment datagrams need to be dispatched intelligently to spread the load on
network resources and maximise the chance that the receiver will get the message.
To do this, the dispatch algorithm should take into account several points:

* Prioritize acknowledgements.
* If fragments are waiting to be sent to a number of hosts, do not send to the same host twice in a row. Alternating or looping through the target hosts achieves the same data rate locally, but spreads out the load over downstream links.
* Do not exceed or approach the capacity of the local network connection. If a host has a 128 kb/s outbound bandwidth, dispatching 32 KB of data in one second will likely cause massive packet loss, leading to a retransmission.
* After considering the above points, prefer fragments that were queued recently to older packets. A LIFO or stack type approach means that even if a transmitter is becoming backed up, some fragments will get there on time while others will be delayed. A FIFO approach would mean that a backed up host delivers every fragment late.

== Parameters ==
The recommended parameters for the reliability layer are as follows:

<pre>
<nowiki>
MTU = 500 bytes
Transmit Retransmit Interval = 10 seconds
Transmit Packet Timeout / Expire = 26 seconds (allows for two retransmissions before expiry)
Receive Packet Expiry = 30 seconds (allows 10 seconds beyond final retransmission)
</nowiki>
</pre>

== Performance Considerations ==

Relatively low-level network implementations such as this are reasonably
complicated, but must operate fast. It is desirable to avoid runtime memory
allocations in network code as much as possible, and particularly at this level.

It should be noted that in almost all cases, transmissions to "untested" nodes are
single fragment. Replies on the other hand are often larger, and may be deflated in
many fragments. This is optimal because attempting to contact a node which may be
unavailable involves a retransmission of only a single fragment.

Flow control is an important topic, however it is handled at a higher layer. The UDP
reliability layer is only responsible for guaranteeing delivery of selected datagrams.

Only critical transmissions whose reception cannot otherwise be inferred should have
the acknowledge request bit set.

TCP Stream Connection and Handshaking

2005-03-27T14:44:35Z

Kath: /* Post Handshake Communication */

== Introduction ==

TCP stream connections are established between Gnutella2 nodes when they elect to
form a permeant link, creating the fundamental network topology of a highly
interconnected hub network serving dense clusters of leaf nodes.

== Initiation ==

TCP connections are initiated by leaf or hub nodes in an attempt to gain a connection
to a hub node. Leaf nodes are never the target of an outbound connection. The TCP
port number is not standardised, and must be stored with the IP address.

== Handshaking ==

Upon the establishment of a TCP stream connection between two Gnutella2 nodes, a
handshaking phase must be completed to negotiate the nature of the link and
exchange other necessary information.

This handshaking phase is the only part of the communication which remains
compatible with the old Gnutella network, allowing new connections to be negotiated
without fore-knowledge the capabilities of the other node. The handshaking process
has been well documented elsewhere, however, a short summary is provided here.

=== Handshake Stages ===

The Gnutella handshake process consists of three header blocks. The node which
initiated the connection sends an initial header block, of the form:

<pre>
<nowiki>
GNUTELLA CONNECT/0.6
Listen-IP: 1.2.3.4:6346
Remote-IP: 6.7.8.9
User-Agent: Shareaza 1.8.2.0
Accept: application/x-gnutella2
X-Ultrapeer: False
</nowiki>
</pre>

The receiver then responds with its own header block:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Listen-IP: 6.7.8.9:6346
Remote-IP: 1.2.3.4
User-Agent: Shareaza 1.8.2.0
Content-Type: application/x-gnutella2
Accept: application/x-gnutella2
X-Ultrapeer: True
X-Ultrapeer-Needed: False
</nowiki>
</pre>

Finally, the initiator accepts the receiver's header block, and provides any final
information:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Content-Type: application/x-gnutella2
X-Ultrapeer: False
The latter two stages may be replaced with an error condition if the connection is
being rejected. Appropriate error status codes are returned in this case, for example:
GNUTELLA/0.6 503 Too many connections
(more headers)
</nowiki>
</pre>

Note that only the HTTP-style error code should be interpreted by the software: any
descriptive text provided is for display purposes only and is not standardised.

== Headers ==

Important headers which are required or strongly recommended are detailed in the
following sections.

=== Addressing Headers ===

Two important headers to send on all connections are "Remote-IP" and "Listen-IP".
Both of these headers should be sent on the first transmission, meaning in the first
and second header blocks in the three block exchange.

The Remote-IP header contains the IP address from which the remote host is
connecting. This allows a remote host operating through some kind of network
address translation system, to learn its effective external address.

The Listen-IP header contains the IP address and port number that the local host is
listening for inbound TCP connections on. It should be listening for UDP datagrams
on the same port. The format of this header is "IP:PORT", eg "1.2.3.4:6346".

=== Identification ===

The User-Agent header is used to identify the client software operating on the
sending node. It should be sent on the first transmission, meaning in the first and
second header blocks in the three block exchange. Note that this is a descriptive
string that often includes a version number, and is not a "vendor code" as described
elsewhere.

=== Content Type (Protocol) ===

The Accept and Content-Type header exchange is used to negotiate the data
protocol that will be used in the connection, in this case Gnutella2. The Gnutella2
content type is "application/x-gnutella2", and this exchange follows standard HTTP
rules for negotiating content type.

The first step is to advertise support for the content type (Gnutella2) in the first
header block with "Accept: application/x-gnutella2". The responding node will then
indicate that it will send Gnutella2 content with "Content-Type: application/xgnutella2",
and that it also supports Gnutella2 with "Accept: application/x-gnutella2".
The initiating host then confirms that it will be sending Gnutella2 with "Content-
Type: application/x-gnutella2" in the third header block. For more information on the
Accept/Content-Type exchange, consult a HTTP reference.

Note that the content type negotiation process is designed to be a "one-way"
process, i.e. a different content type can be negotiated for sending and receiving.
However, when the gnutella2 protocol is negotiated, both channels must use the
same content type. This means that a receiving node must not accept Gnutella2 if
the initiator did not advertise support for it, and if at the end of the handshake,
bidirectional Gnutella2 was not negotiated, the connection should be terminated.

=== Node State Negotiation ===

There are two node types in a Gnutella2 peer to peer network, a hub and a leaf, as
described in [[Node Types and Responsibilities]]. During the initial handshake,
the two parties must exchange their current node type and advise of their
capabilities, negotiating the node types they will adopt when the connection
completes, and indeed whether it should complete at all.

As the handshake sequence is compatible with Gnutella1, the headers involved in
negotiating node types are identical to those used to negotiate Gnutella1 "Ultrapeer"
states:

<pre>
<nowiki>
X-Ultrapeer: [True|False]
X-Ultrapeer-Needed: [True|False]
</nowiki>
</pre>

Both headers contain a Boolean value, "true" or "false", case insensitive.

The X-Ultrapeer header indicates whether the transmitting node is currently
operating as a hub. Hub nodes will send "X-Ultrapeer: True" while leaf nodes will
send "X-Ultrapeer: False".

The X-Ultrapeer-Needed header indicates whether the transmitting node would like
(and allow) the receiver to be a hub. A hub which sees no need for additional hubs in
its area of the network, will send "X-Ultrapeer-Needed: False", indicating to the
connecting node that it must operate in leaf mode if it wishes to connect. A hub
which sees a need for additional hubs, will send "X-Ultrapeer-Needed: True",
indicating that the receiving node should become a hub, if it is capable of doing so. A
leaf may send "X-Ultrapeer-Needed: True" to indicate that it is seeking a connection
to a hub.

The X-Ultrapeer header should be sent on all three of the header blocks to indicate
the current intended state of the node, while the X-Ultrapeer-Needed header should
be sent in the first two header blocks only, to indicate the desired status of the
receiver. If the nodes cannot agree on a satisfactory arrangement, the connection
will be terminated at or prior to the third header block with an appropriate message,
for example:

<pre>
<nowiki>
GNUTELLA/0.6 503 Too many hub connections
GNUTELLA/0.6 503 Too many leaves
GNUTELLA/0.6 503 I have leaves, can't downgrade to leaf mode
GNUTELLA/0.6 503 Leaf mode disabled
</nowiki>
</pre>

=== Hub Address Exchange ===

It is desirable for connecting nodes to exchange the node addresses of other hubs on
the network to facilitate rapid connection. The Gnutella2 protocol includes highly
efficient methods to share hub addresses with peers once connected, but for a node
trying to connect and encountering only "full" hubs, learning new hubs to try is
helpful.

The X-Try-Ultrapeers header was developed for this purpose, and like the rest of
the handshake is also semi-compatible with Gnutella1. It contains a comma
separated list of hub node addresses and ports, along with a timestamp recording
the time the hub was last seen. For example:

<pre>
<nowiki>
X-Try-Ultrapeers: 1.2.3.4:6346 2003-03-25T23:59Z, [..more..]
</nowiki>
</pre>

Hub addresses should not be sent unless the transmitter has reasonable knowledge
of the hub's existence and the timestamp is not too old. The content type negotiation
headers should be used to verify that the listed hub addresses are indeed Gnutella2
hubs.

=== Compression Negotiation and Encoding ===

The Gnutella2 architecture makes widespread use of "deflate" compression, due to
its high availability and ease of integration. Support of compressed TCP links is not a
requirement in the Gnutella2 standard, however, it is strongly recommended.

Deflate compression of a TCP link is negotiated with the pair standard HTTP headers
"Accept-Encoding" and "Content-Encoding". For example:

Header block one (initiator):

<pre>
<nowiki>
Accept-Encoding: deflate
</nowiki>
</pre>

Header block two (receiver):

<pre>
<nowiki>
Accept-Encoding: deflate
Content-Encoding: deflate
</nowiki>
</pre>

Header block three (initiator):

<pre>
<nowiki>
Content-Encoding: deflate
</nowiki>
</pre>

In this example, the initiator advertises support for receiving a deflated connection.
The receiver then indicates that it too supports receiving a deflated connection, and
that it intends to transmit deflated data. Finally, the initiator upon noting that the
receiver supports deflate, indicates that it too will transmit deflated data.

Note that unlike the content-type / protocol negotiation, deflated encoding can be
applied on either incoming, outgoing, or both channels of a connection.

For performance reasons, nodes should consider whether they can afford to support
additional deflated connections before advertising support for them, or agreeing to
provide a deflated data stream. In the Gnutella2 network topology, all links benefit
from compression, except the leaf to hub channel of the leaf/hub link. Exempting this
channel from compression, saves the leaf and more importantly, the hub, a
considerable CPU and RAM investment.

== Post Handshake Communication ==

After the third and final header block has been received by the initiator, subsequent
communication over the TCP stream occurs in the negotiated protocol, with the
negotiated encoding. This means that while the handshake sequence was backwards
compatible with Gnutella1, after Gnutella2 support has been negotiated, all
subsequent communication occurs in the Gnutella2 common protocol - an entirely
new system not backwards compatible with any other protocol.

TCP Stream Connection and Handshaking

2005-03-27T14:42:51Z

Kath: /* Compression Negotiation and Encoding */

== Introduction ==

TCP stream connections are established between Gnutella2 nodes when they elect to
form a permeant link, creating the fundamental network topology of a highly
interconnected hub network serving dense clusters of leaf nodes.

== Initiation ==

TCP connections are initiated by leaf or hub nodes in an attempt to gain a connection
to a hub node. Leaf nodes are never the target of an outbound connection. The TCP
port number is not standardised, and must be stored with the IP address.

== Handshaking ==

Upon the establishment of a TCP stream connection between two Gnutella2 nodes, a
handshaking phase must be completed to negotiate the nature of the link and
exchange other necessary information.

This handshaking phase is the only part of the communication which remains
compatible with the old Gnutella network, allowing new connections to be negotiated
without fore-knowledge the capabilities of the other node. The handshaking process
has been well documented elsewhere, however, a short summary is provided here.

=== Handshake Stages ===

The Gnutella handshake process consists of three header blocks. The node which
initiated the connection sends an initial header block, of the form:

<pre>
<nowiki>
GNUTELLA CONNECT/0.6
Listen-IP: 1.2.3.4:6346
Remote-IP: 6.7.8.9
User-Agent: Shareaza 1.8.2.0
Accept: application/x-gnutella2
X-Ultrapeer: False
</nowiki>
</pre>

The receiver then responds with its own header block:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Listen-IP: 6.7.8.9:6346
Remote-IP: 1.2.3.4
User-Agent: Shareaza 1.8.2.0
Content-Type: application/x-gnutella2
Accept: application/x-gnutella2
X-Ultrapeer: True
X-Ultrapeer-Needed: False
</nowiki>
</pre>

Finally, the initiator accepts the receiver's header block, and provides any final
information:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Content-Type: application/x-gnutella2
X-Ultrapeer: False
The latter two stages may be replaced with an error condition if the connection is
being rejected. Appropriate error status codes are returned in this case, for example:
GNUTELLA/0.6 503 Too many connections
(more headers)
</nowiki>
</pre>

Note that only the HTTP-style error code should be interpreted by the software: any
descriptive text provided is for display purposes only and is not standardised.

== Headers ==

Important headers which are required or strongly recommended are detailed in the
following sections.

=== Addressing Headers ===

Two important headers to send on all connections are "Remote-IP" and "Listen-IP".
Both of these headers should be sent on the first transmission, meaning in the first
and second header blocks in the three block exchange.

The Remote-IP header contains the IP address from which the remote host is
connecting. This allows a remote host operating through some kind of network
address translation system, to learn its effective external address.

The Listen-IP header contains the IP address and port number that the local host is
listening for inbound TCP connections on. It should be listening for UDP datagrams
on the same port. The format of this header is "IP:PORT", eg "1.2.3.4:6346".

=== Identification ===

The User-Agent header is used to identify the client software operating on the
sending node. It should be sent on the first transmission, meaning in the first and
second header blocks in the three block exchange. Note that this is a descriptive
string that often includes a version number, and is not a "vendor code" as described
elsewhere.

=== Content Type (Protocol) ===

The Accept and Content-Type header exchange is used to negotiate the data
protocol that will be used in the connection, in this case Gnutella2. The Gnutella2
content type is "application/x-gnutella2", and this exchange follows standard HTTP
rules for negotiating content type.

The first step is to advertise support for the content type (Gnutella2) in the first
header block with "Accept: application/x-gnutella2". The responding node will then
indicate that it will send Gnutella2 content with "Content-Type: application/xgnutella2",
and that it also supports Gnutella2 with "Accept: application/x-gnutella2".
The initiating host then confirms that it will be sending Gnutella2 with "Content-
Type: application/x-gnutella2" in the third header block. For more information on the
Accept/Content-Type exchange, consult a HTTP reference.

Note that the content type negotiation process is designed to be a "one-way"
process, i.e. a different content type can be negotiated for sending and receiving.
However, when the gnutella2 protocol is negotiated, both channels must use the
same content type. This means that a receiving node must not accept Gnutella2 if
the initiator did not advertise support for it, and if at the end of the handshake,
bidirectional Gnutella2 was not negotiated, the connection should be terminated.

=== Node State Negotiation ===

There are two node types in a Gnutella2 peer to peer network, a hub and a leaf, as
described in [[Node Types and Responsibilities]]. During the initial handshake,
the two parties must exchange their current node type and advise of their
capabilities, negotiating the node types they will adopt when the connection
completes, and indeed whether it should complete at all.

As the handshake sequence is compatible with Gnutella1, the headers involved in
negotiating node types are identical to those used to negotiate Gnutella1 "Ultrapeer"
states:

<pre>
<nowiki>
X-Ultrapeer: [True|False]
X-Ultrapeer-Needed: [True|False]
</nowiki>
</pre>

Both headers contain a Boolean value, "true" or "false", case insensitive.

The X-Ultrapeer header indicates whether the transmitting node is currently
operating as a hub. Hub nodes will send "X-Ultrapeer: True" while leaf nodes will
send "X-Ultrapeer: False".

The X-Ultrapeer-Needed header indicates whether the transmitting node would like
(and allow) the receiver to be a hub. A hub which sees no need for additional hubs in
its area of the network, will send "X-Ultrapeer-Needed: False", indicating to the
connecting node that it must operate in leaf mode if it wishes to connect. A hub
which sees a need for additional hubs, will send "X-Ultrapeer-Needed: True",
indicating that the receiving node should become a hub, if it is capable of doing so. A
leaf may send "X-Ultrapeer-Needed: True" to indicate that it is seeking a connection
to a hub.

The X-Ultrapeer header should be sent on all three of the header blocks to indicate
the current intended state of the node, while the X-Ultrapeer-Needed header should
be sent in the first two header blocks only, to indicate the desired status of the
receiver. If the nodes cannot agree on a satisfactory arrangement, the connection
will be terminated at or prior to the third header block with an appropriate message,
for example:

<pre>
<nowiki>
GNUTELLA/0.6 503 Too many hub connections
GNUTELLA/0.6 503 Too many leaves
GNUTELLA/0.6 503 I have leaves, can't downgrade to leaf mode
GNUTELLA/0.6 503 Leaf mode disabled
</nowiki>
</pre>

=== Hub Address Exchange ===

It is desirable for connecting nodes to exchange the node addresses of other hubs on
the network to facilitate rapid connection. The Gnutella2 protocol includes highly
efficient methods to share hub addresses with peers once connected, but for a node
trying to connect and encountering only "full" hubs, learning new hubs to try is
helpful.

The X-Try-Ultrapeers header was developed for this purpose, and like the rest of
the handshake is also semi-compatible with Gnutella1. It contains a comma
separated list of hub node addresses and ports, along with a timestamp recording
the time the hub was last seen. For example:

<pre>
<nowiki>
X-Try-Ultrapeers: 1.2.3.4:6346 2003-03-25T23:59Z, [..more..]
</nowiki>
</pre>

Hub addresses should not be sent unless the transmitter has reasonable knowledge
of the hub's existence and the timestamp is not too old. The content type negotiation
headers should be used to verify that the listed hub addresses are indeed Gnutella2
hubs.

=== Compression Negotiation and Encoding ===

The Gnutella2 architecture makes widespread use of "deflate" compression, due to
its high availability and ease of integration. Support of compressed TCP links is not a
requirement in the Gnutella2 standard, however, it is strongly recommended.

Deflate compression of a TCP link is negotiated with the pair standard HTTP headers
"Accept-Encoding" and "Content-Encoding". For example:

Header block one (initiator):

<pre>
<nowiki>
Accept-Encoding: deflate
</nowiki>
</pre>

Header block two (receiver):

<pre>
<nowiki>
Accept-Encoding: deflate
Content-Encoding: deflate
</nowiki>
</pre>

Header block three (initiator):

<pre>
<nowiki>
Content-Encoding: deflate
</nowiki>
</pre>

In this example, the initiator advertises support for receiving a deflated connection.
The receiver then indicates that it too supports receiving a deflated connection, and
that it intends to transmit deflated data. Finally, the initiator upon noting that the
receiver supports deflate, indicates that it too will transmit deflated data.

Note that unlike the content-type / protocol negotiation, deflated encoding can be
applied on either incoming, outgoing, or both channels of a connection.

For performance reasons, nodes should consider whether they can afford to support
additional deflated connections before advertising support for them, or agreeing to
provide a deflated data stream. In the Gnutella2 network topology, all links benefit
from compression, except the leaf to hub channel of the leaf/hub link. Exempting this
channel from compression, saves the leaf and more importantly, the hub, a
considerable CPU and RAM investment.

== Post Handshake Communication ==

After the third and final header block has been received by the initiator, subsequent
communication over the TCP stream occurs in the negotiated protocol, with the
negotiated encoding. This means that while the handshake sequence was backwards
compatible with Gnutella1, after Gnutella2 support has been negotiated all
subsequent communication occurs in the Gnutella2 common protocol - an entirely
new system not backwards compatible with any other protocol.

TCP Stream Connection and Handshaking

2005-03-27T14:36:39Z

Kath: /* Node State Negotiation */

== Introduction ==

TCP stream connections are established between Gnutella2 nodes when they elect to
form a permeant link, creating the fundamental network topology of a highly
interconnected hub network serving dense clusters of leaf nodes.

== Initiation ==

TCP connections are initiated by leaf or hub nodes in an attempt to gain a connection
to a hub node. Leaf nodes are never the target of an outbound connection. The TCP
port number is not standardised, and must be stored with the IP address.

== Handshaking ==

Upon the establishment of a TCP stream connection between two Gnutella2 nodes, a
handshaking phase must be completed to negotiate the nature of the link and
exchange other necessary information.

This handshaking phase is the only part of the communication which remains
compatible with the old Gnutella network, allowing new connections to be negotiated
without fore-knowledge the capabilities of the other node. The handshaking process
has been well documented elsewhere, however, a short summary is provided here.

=== Handshake Stages ===

The Gnutella handshake process consists of three header blocks. The node which
initiated the connection sends an initial header block, of the form:

<pre>
<nowiki>
GNUTELLA CONNECT/0.6
Listen-IP: 1.2.3.4:6346
Remote-IP: 6.7.8.9
User-Agent: Shareaza 1.8.2.0
Accept: application/x-gnutella2
X-Ultrapeer: False
</nowiki>
</pre>

The receiver then responds with its own header block:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Listen-IP: 6.7.8.9:6346
Remote-IP: 1.2.3.4
User-Agent: Shareaza 1.8.2.0
Content-Type: application/x-gnutella2
Accept: application/x-gnutella2
X-Ultrapeer: True
X-Ultrapeer-Needed: False
</nowiki>
</pre>

Finally, the initiator accepts the receiver's header block, and provides any final
information:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Content-Type: application/x-gnutella2
X-Ultrapeer: False
The latter two stages may be replaced with an error condition if the connection is
being rejected. Appropriate error status codes are returned in this case, for example:
GNUTELLA/0.6 503 Too many connections
(more headers)
</nowiki>
</pre>

Note that only the HTTP-style error code should be interpreted by the software: any
descriptive text provided is for display purposes only and is not standardised.

== Headers ==

Important headers which are required or strongly recommended are detailed in the
following sections.

=== Addressing Headers ===

Two important headers to send on all connections are "Remote-IP" and "Listen-IP".
Both of these headers should be sent on the first transmission, meaning in the first
and second header blocks in the three block exchange.

The Remote-IP header contains the IP address from which the remote host is
connecting. This allows a remote host operating through some kind of network
address translation system, to learn its effective external address.

The Listen-IP header contains the IP address and port number that the local host is
listening for inbound TCP connections on. It should be listening for UDP datagrams
on the same port. The format of this header is "IP:PORT", eg "1.2.3.4:6346".

=== Identification ===

The User-Agent header is used to identify the client software operating on the
sending node. It should be sent on the first transmission, meaning in the first and
second header blocks in the three block exchange. Note that this is a descriptive
string that often includes a version number, and is not a "vendor code" as described
elsewhere.

=== Content Type (Protocol) ===

The Accept and Content-Type header exchange is used to negotiate the data
protocol that will be used in the connection, in this case Gnutella2. The Gnutella2
content type is "application/x-gnutella2", and this exchange follows standard HTTP
rules for negotiating content type.

The first step is to advertise support for the content type (Gnutella2) in the first
header block with "Accept: application/x-gnutella2". The responding node will then
indicate that it will send Gnutella2 content with "Content-Type: application/xgnutella2",
and that it also supports Gnutella2 with "Accept: application/x-gnutella2".
The initiating host then confirms that it will be sending Gnutella2 with "Content-
Type: application/x-gnutella2" in the third header block. For more information on the
Accept/Content-Type exchange, consult a HTTP reference.

Note that the content type negotiation process is designed to be a "one-way"
process, i.e. a different content type can be negotiated for sending and receiving.
However, when the gnutella2 protocol is negotiated, both channels must use the
same content type. This means that a receiving node must not accept Gnutella2 if
the initiator did not advertise support for it, and if at the end of the handshake,
bidirectional Gnutella2 was not negotiated, the connection should be terminated.

=== Node State Negotiation ===

There are two node types in a Gnutella2 peer to peer network, a hub and a leaf, as
described in [[Node Types and Responsibilities]]. During the initial handshake,
the two parties must exchange their current node type and advise of their
capabilities, negotiating the node types they will adopt when the connection
completes, and indeed whether it should complete at all.

As the handshake sequence is compatible with Gnutella1, the headers involved in
negotiating node types are identical to those used to negotiate Gnutella1 "Ultrapeer"
states:

<pre>
<nowiki>
X-Ultrapeer: [True|False]
X-Ultrapeer-Needed: [True|False]
</nowiki>
</pre>

Both headers contain a Boolean value, "true" or "false", case insensitive.

The X-Ultrapeer header indicates whether the transmitting node is currently
operating as a hub. Hub nodes will send "X-Ultrapeer: True" while leaf nodes will
send "X-Ultrapeer: False".

The X-Ultrapeer-Needed header indicates whether the transmitting node would like
(and allow) the receiver to be a hub. A hub which sees no need for additional hubs in
its area of the network, will send "X-Ultrapeer-Needed: False", indicating to the
connecting node that it must operate in leaf mode if it wishes to connect. A hub
which sees a need for additional hubs, will send "X-Ultrapeer-Needed: True",
indicating that the receiving node should become a hub, if it is capable of doing so. A
leaf may send "X-Ultrapeer-Needed: True" to indicate that it is seeking a connection
to a hub.

The X-Ultrapeer header should be sent on all three of the header blocks to indicate
the current intended state of the node, while the X-Ultrapeer-Needed header should
be sent in the first two header blocks only, to indicate the desired status of the
receiver. If the nodes cannot agree on a satisfactory arrangement, the connection
will be terminated at or prior to the third header block with an appropriate message,
for example:

<pre>
<nowiki>
GNUTELLA/0.6 503 Too many hub connections
GNUTELLA/0.6 503 Too many leaves
GNUTELLA/0.6 503 I have leaves, can't downgrade to leaf mode
GNUTELLA/0.6 503 Leaf mode disabled
</nowiki>
</pre>

=== Hub Address Exchange ===

It is desirable for connecting nodes to exchange the node addresses of other hubs on
the network to facilitate rapid connection. The Gnutella2 protocol includes highly
efficient methods to share hub addresses with peers once connected, but for a node
trying to connect and encountering only "full" hubs, learning new hubs to try is
helpful.

The X-Try-Ultrapeers header was developed for this purpose, and like the rest of
the handshake is also semi-compatible with Gnutella1. It contains a comma
separated list of hub node addresses and ports, along with a timestamp recording
the time the hub was last seen. For example:

<pre>
<nowiki>
X-Try-Ultrapeers: 1.2.3.4:6346 2003-03-25T23:59Z, [..more..]
</nowiki>
</pre>

Hub addresses should not be sent unless the transmitter has reasonable knowledge
of the hub's existence and the timestamp is not too old. The content type negotiation
headers should be used to verify that the listed hub addresses are indeed Gnutella2
hubs.

=== Compression Negotiation and Encoding ===

The Gnutella2 architecture makes widespread use of "deflate" compression, due to
its high availability and ease of integration. Support of compressed TCP links is not a
requirement in the Gnutella2 standard, however it is strongly recommended.

Deflate compression of a TCP link is negotiated with the pair standard HTTP headers
"Accept-Encoding" and "Content-Encoding". For example:

Header block one (initiator):

<pre>
<nowiki>
Accept-Encoding: deflate
</nowiki>
</pre>

Header block two (receiver):

<pre>
<nowiki>
Accept-Encoding: deflate
Content-Encoding: deflate
</nowiki>
</pre>

Header block three (initiator):

<pre>
<nowiki>
Content-Encoding: deflate
</nowiki>
</pre>

In this example the initiator advertises support for receiving a deflated connection.
The receiver then indicates that it too supports receiving a deflated connection, and
that it intends to transmit deflated data. Finally, the initiator upon noting that the
receiver supports deflate indicates that it too will transmit deflated data.

Note that unlike the content-type / protocol negotiation, deflated encoding can be
applied on either incoming, outgoing or both channels of a connection.

For performance reasons, nodes should consider whether they can afford to support
additional deflated connections before advertising support for them, or agreeing to
provide a deflated data stream. In the Gnutella2 network topology, all links benefit
from compression except the leaf to hub channel of the leaf/hub link. Exempting this
channel from compression saves the leaf and more importantly the hub a
considerable CPU and RAM investment.

== Post Handshake Communication ==

After the third and final header block has been received by the initiator, subsequent
communication over the TCP stream occurs in the negotiated protocol, with the
negotiated encoding. This means that while the handshake sequence was backwards
compatible with Gnutella1, after Gnutella2 support has been negotiated all
subsequent communication occurs in the Gnutella2 common protocol - an entirely
new system not backwards compatible with any other protocol.

TCP Stream Connection and Handshaking

2005-03-27T14:28:21Z

Kath: /* Content Type (Protocol) */

== Introduction ==

TCP stream connections are established between Gnutella2 nodes when they elect to
form a permeant link, creating the fundamental network topology of a highly
interconnected hub network serving dense clusters of leaf nodes.

== Initiation ==

TCP connections are initiated by leaf or hub nodes in an attempt to gain a connection
to a hub node. Leaf nodes are never the target of an outbound connection. The TCP
port number is not standardised, and must be stored with the IP address.

== Handshaking ==

Upon the establishment of a TCP stream connection between two Gnutella2 nodes, a
handshaking phase must be completed to negotiate the nature of the link and
exchange other necessary information.

This handshaking phase is the only part of the communication which remains
compatible with the old Gnutella network, allowing new connections to be negotiated
without fore-knowledge the capabilities of the other node. The handshaking process
has been well documented elsewhere, however, a short summary is provided here.

=== Handshake Stages ===

The Gnutella handshake process consists of three header blocks. The node which
initiated the connection sends an initial header block, of the form:

<pre>
<nowiki>
GNUTELLA CONNECT/0.6
Listen-IP: 1.2.3.4:6346
Remote-IP: 6.7.8.9
User-Agent: Shareaza 1.8.2.0
Accept: application/x-gnutella2
X-Ultrapeer: False
</nowiki>
</pre>

The receiver then responds with its own header block:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Listen-IP: 6.7.8.9:6346
Remote-IP: 1.2.3.4
User-Agent: Shareaza 1.8.2.0
Content-Type: application/x-gnutella2
Accept: application/x-gnutella2
X-Ultrapeer: True
X-Ultrapeer-Needed: False
</nowiki>
</pre>

Finally, the initiator accepts the receiver's header block, and provides any final
information:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Content-Type: application/x-gnutella2
X-Ultrapeer: False
The latter two stages may be replaced with an error condition if the connection is
being rejected. Appropriate error status codes are returned in this case, for example:
GNUTELLA/0.6 503 Too many connections
(more headers)
</nowiki>
</pre>

Note that only the HTTP-style error code should be interpreted by the software: any
descriptive text provided is for display purposes only and is not standardised.

== Headers ==

Important headers which are required or strongly recommended are detailed in the
following sections.

=== Addressing Headers ===

Two important headers to send on all connections are "Remote-IP" and "Listen-IP".
Both of these headers should be sent on the first transmission, meaning in the first
and second header blocks in the three block exchange.

The Remote-IP header contains the IP address from which the remote host is
connecting. This allows a remote host operating through some kind of network
address translation system, to learn its effective external address.

The Listen-IP header contains the IP address and port number that the local host is
listening for inbound TCP connections on. It should be listening for UDP datagrams
on the same port. The format of this header is "IP:PORT", eg "1.2.3.4:6346".

=== Identification ===

The User-Agent header is used to identify the client software operating on the
sending node. It should be sent on the first transmission, meaning in the first and
second header blocks in the three block exchange. Note that this is a descriptive
string that often includes a version number, and is not a "vendor code" as described
elsewhere.

=== Content Type (Protocol) ===

The Accept and Content-Type header exchange is used to negotiate the data
protocol that will be used in the connection, in this case Gnutella2. The Gnutella2
content type is "application/x-gnutella2", and this exchange follows standard HTTP
rules for negotiating content type.

The first step is to advertise support for the content type (Gnutella2) in the first
header block with "Accept: application/x-gnutella2". The responding node will then
indicate that it will send Gnutella2 content with "Content-Type: application/xgnutella2",
and that it also supports Gnutella2 with "Accept: application/x-gnutella2".
The initiating host then confirms that it will be sending Gnutella2 with "Content-
Type: application/x-gnutella2" in the third header block. For more information on the
Accept/Content-Type exchange, consult a HTTP reference.

Note that the content type negotiation process is designed to be a "one-way"
process, i.e. a different content type can be negotiated for sending and receiving.
However, when the gnutella2 protocol is negotiated, both channels must use the
same content type. This means that a receiving node must not accept Gnutella2 if
the initiator did not advertise support for it, and if at the end of the handshake,
bidirectional Gnutella2 was not negotiated, the connection should be terminated.

=== Node State Negotiation ===

There are two node types in a Gnutella2 peer to peer network, a hub and a leaf as
described in [[Node Types and Responsibilities]]. During the initial handshake
the two parties must exchange their current node type and advise of their
capabilities, negotiating the node types they will adopt when the connection
completes, and indeed whether it should complete at all.

As the handshake sequence is compatible with Gnutella1, the headers involved in
negotiating node types are identical to those used to negotiate Gnutella1 "Ultrapeer"
states:

<pre>
<nowiki>
X-Ultrapeer: [True|False]
X-Ultrapeer-Needed: [True|False]
</nowiki>
</pre>

Both headers contain a Boolean value, "true" or "false", case insensitive.

The X-Ultrapeer header indicates whether the transmitting node is currently
operating as a hub. Hub nodes will send "X-Ultrapeer: True" while leaf nodes will
send "X-Ultrapeer: False".

The X-Ultrapeer-Needed header indicates whether the transmitting node would like
(and allow) the receiver to be a hub. A hub which sees no need for additional hubs in
its area of the network will send "X-Ultrapeer-Needed: False", indicating to the
connecting node that it must operate in leaf mode if it wishes to connect. A hub
which sees a need for additional hubs will send "X-Ultrapeer-Needed: True",
indicating that the receiving node should become a hub if it is capable of doing so. A
leaf may send "X-Ultrapeer-Needed: True" to indicate that it is seeking a connection
to a hub.

The X-Ultrapeer header should be sent on all three of the header blocks to indicate
the current intended state of the node, while the X-Ultrapeer-Needed header should
be sent in the first two header blocks only to indicate the desired status of the
receiver. If the nodes cannot agree on a satisfactory arrangement, the connection
will be terminated at or prior to the third header block with an appropriate message,
for example:

<pre>
<nowiki>
GNUTELLA/0.6 503 Too many hub connections
GNUTELLA/0.6 503 Too many leaves
GNUTELLA/0.6 503 I have leaves, can't downgrade to leaf mode
GNUTELLA/0.6 503 Leaf mode disabled
</nowiki>
</pre>

=== Hub Address Exchange ===

It is desirable for connecting nodes to exchange the node addresses of other hubs on
the network to facilitate rapid connection. The Gnutella2 protocol includes highly
efficient methods to share hub addresses with peers once connected, but for a node
trying to connect and encountering only "full" hubs, learning new hubs to try is
helpful.

The X-Try-Ultrapeers header was developed for this purpose, and like the rest of
the handshake is also semi-compatible with Gnutella1. It contains a comma
separated list of hub node addresses and ports, along with a timestamp recording
the time the hub was last seen. For example:

<pre>
<nowiki>
X-Try-Ultrapeers: 1.2.3.4:6346 2003-03-25T23:59Z, [..more..]
</nowiki>
</pre>

Hub addresses should not be sent unless the transmitter has reasonable knowledge
of the hub's existence and the timestamp is not too old. The content type negotiation
headers should be used to verify that the listed hub addresses are indeed Gnutella2
hubs.

=== Compression Negotiation and Encoding ===

The Gnutella2 architecture makes widespread use of "deflate" compression, due to
its high availability and ease of integration. Support of compressed TCP links is not a
requirement in the Gnutella2 standard, however it is strongly recommended.

Deflate compression of a TCP link is negotiated with the pair standard HTTP headers
"Accept-Encoding" and "Content-Encoding". For example:

Header block one (initiator):

<pre>
<nowiki>
Accept-Encoding: deflate
</nowiki>
</pre>

Header block two (receiver):

<pre>
<nowiki>
Accept-Encoding: deflate
Content-Encoding: deflate
</nowiki>
</pre>

Header block three (initiator):

<pre>
<nowiki>
Content-Encoding: deflate
</nowiki>
</pre>

In this example the initiator advertises support for receiving a deflated connection.
The receiver then indicates that it too supports receiving a deflated connection, and
that it intends to transmit deflated data. Finally, the initiator upon noting that the
receiver supports deflate indicates that it too will transmit deflated data.

Note that unlike the content-type / protocol negotiation, deflated encoding can be
applied on either incoming, outgoing or both channels of a connection.

For performance reasons, nodes should consider whether they can afford to support
additional deflated connections before advertising support for them, or agreeing to
provide a deflated data stream. In the Gnutella2 network topology, all links benefit
from compression except the leaf to hub channel of the leaf/hub link. Exempting this
channel from compression saves the leaf and more importantly the hub a
considerable CPU and RAM investment.

== Post Handshake Communication ==

After the third and final header block has been received by the initiator, subsequent
communication over the TCP stream occurs in the negotiated protocol, with the
negotiated encoding. This means that while the handshake sequence was backwards
compatible with Gnutella1, after Gnutella2 support has been negotiated all
subsequent communication occurs in the Gnutella2 common protocol - an entirely
new system not backwards compatible with any other protocol.

TCP Stream Connection and Handshaking

2005-03-27T14:23:05Z

Kath: /* Addressing Headers */

== Introduction ==

TCP stream connections are established between Gnutella2 nodes when they elect to
form a permeant link, creating the fundamental network topology of a highly
interconnected hub network serving dense clusters of leaf nodes.

== Initiation ==

TCP connections are initiated by leaf or hub nodes in an attempt to gain a connection
to a hub node. Leaf nodes are never the target of an outbound connection. The TCP
port number is not standardised, and must be stored with the IP address.

== Handshaking ==

Upon the establishment of a TCP stream connection between two Gnutella2 nodes, a
handshaking phase must be completed to negotiate the nature of the link and
exchange other necessary information.

This handshaking phase is the only part of the communication which remains
compatible with the old Gnutella network, allowing new connections to be negotiated
without fore-knowledge the capabilities of the other node. The handshaking process
has been well documented elsewhere, however, a short summary is provided here.

=== Handshake Stages ===

The Gnutella handshake process consists of three header blocks. The node which
initiated the connection sends an initial header block, of the form:

<pre>
<nowiki>
GNUTELLA CONNECT/0.6
Listen-IP: 1.2.3.4:6346
Remote-IP: 6.7.8.9
User-Agent: Shareaza 1.8.2.0
Accept: application/x-gnutella2
X-Ultrapeer: False
</nowiki>
</pre>

The receiver then responds with its own header block:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Listen-IP: 6.7.8.9:6346
Remote-IP: 1.2.3.4
User-Agent: Shareaza 1.8.2.0
Content-Type: application/x-gnutella2
Accept: application/x-gnutella2
X-Ultrapeer: True
X-Ultrapeer-Needed: False
</nowiki>
</pre>

Finally, the initiator accepts the receiver's header block, and provides any final
information:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Content-Type: application/x-gnutella2
X-Ultrapeer: False
The latter two stages may be replaced with an error condition if the connection is
being rejected. Appropriate error status codes are returned in this case, for example:
GNUTELLA/0.6 503 Too many connections
(more headers)
</nowiki>
</pre>

Note that only the HTTP-style error code should be interpreted by the software: any
descriptive text provided is for display purposes only and is not standardised.

== Headers ==

Important headers which are required or strongly recommended are detailed in the
following sections.

=== Addressing Headers ===

Two important headers to send on all connections are "Remote-IP" and "Listen-IP".
Both of these headers should be sent on the first transmission, meaning in the first
and second header blocks in the three block exchange.

The Remote-IP header contains the IP address from which the remote host is
connecting. This allows a remote host operating through some kind of network
address translation system, to learn its effective external address.

The Listen-IP header contains the IP address and port number that the local host is
listening for inbound TCP connections on. It should be listening for UDP datagrams
on the same port. The format of this header is "IP:PORT", eg "1.2.3.4:6346".

=== Identification ===

The User-Agent header is used to identify the client software operating on the
sending node. It should be sent on the first transmission, meaning in the first and
second header blocks in the three block exchange. Note that this is a descriptive
string that often includes a version number, and is not a "vendor code" as described
elsewhere.

=== Content Type (Protocol) ===

The Accept and Content-Type header exchange is used to negotiate the data
protocol that will be used in the connection, in this case Gnutella2. The Gnutella2
content type is "application/x-gnutella2", and this exchange follows standard HTTP
rules for negotiating content type.

The first step is to advertise support for the content type (Gnutella2) in the first
header block with "Accept: application/x-gnutella2". The responding node will then
indicate that it will send Gnutella2 content with "Content-Type: application/xgnutella2",
and that it also supports Gnutella2 with "Accept: application/x-gnutella2".
The initiating host then confirms that it will be sending Gnutella2 with "Content-
Type: application/x-gnutella2" in the third header block. For more information on the
Accept/Content-Type exchange, consult a HTTP reference.

Note that the content type negotiation process is designed to be a "one way"
process, i.e. a different content type can be negotiated for sending and receiving.
However when the gnutella2 protocol is negotiated, both channels must use the
same content type. This means that a receiving node must not accept Gnutella2 if
the initiator did not advertise support for it, and if at the end of the handshake
bidirectional Gnutella2 was not negotiated, the connection should be terminated.

=== Node State Negotiation ===

There are two node types in a Gnutella2 peer to peer network, a hub and a leaf as
described in [[Node Types and Responsibilities]]. During the initial handshake
the two parties must exchange their current node type and advise of their
capabilities, negotiating the node types they will adopt when the connection
completes, and indeed whether it should complete at all.

As the handshake sequence is compatible with Gnutella1, the headers involved in
negotiating node types are identical to those used to negotiate Gnutella1 "Ultrapeer"
states:

<pre>
<nowiki>
X-Ultrapeer: [True|False]
X-Ultrapeer-Needed: [True|False]
</nowiki>
</pre>

Both headers contain a Boolean value, "true" or "false", case insensitive.

The X-Ultrapeer header indicates whether the transmitting node is currently
operating as a hub. Hub nodes will send "X-Ultrapeer: True" while leaf nodes will
send "X-Ultrapeer: False".

The X-Ultrapeer-Needed header indicates whether the transmitting node would like
(and allow) the receiver to be a hub. A hub which sees no need for additional hubs in
its area of the network will send "X-Ultrapeer-Needed: False", indicating to the
connecting node that it must operate in leaf mode if it wishes to connect. A hub
which sees a need for additional hubs will send "X-Ultrapeer-Needed: True",
indicating that the receiving node should become a hub if it is capable of doing so. A
leaf may send "X-Ultrapeer-Needed: True" to indicate that it is seeking a connection
to a hub.

The X-Ultrapeer header should be sent on all three of the header blocks to indicate
the current intended state of the node, while the X-Ultrapeer-Needed header should
be sent in the first two header blocks only to indicate the desired status of the
receiver. If the nodes cannot agree on a satisfactory arrangement, the connection
will be terminated at or prior to the third header block with an appropriate message,
for example:

<pre>
<nowiki>
GNUTELLA/0.6 503 Too many hub connections
GNUTELLA/0.6 503 Too many leaves
GNUTELLA/0.6 503 I have leaves, can't downgrade to leaf mode
GNUTELLA/0.6 503 Leaf mode disabled
</nowiki>
</pre>

=== Hub Address Exchange ===

It is desirable for connecting nodes to exchange the node addresses of other hubs on
the network to facilitate rapid connection. The Gnutella2 protocol includes highly
efficient methods to share hub addresses with peers once connected, but for a node
trying to connect and encountering only "full" hubs, learning new hubs to try is
helpful.

The X-Try-Ultrapeers header was developed for this purpose, and like the rest of
the handshake is also semi-compatible with Gnutella1. It contains a comma
separated list of hub node addresses and ports, along with a timestamp recording
the time the hub was last seen. For example:

<pre>
<nowiki>
X-Try-Ultrapeers: 1.2.3.4:6346 2003-03-25T23:59Z, [..more..]
</nowiki>
</pre>

Hub addresses should not be sent unless the transmitter has reasonable knowledge
of the hub's existence and the timestamp is not too old. The content type negotiation
headers should be used to verify that the listed hub addresses are indeed Gnutella2
hubs.

=== Compression Negotiation and Encoding ===

The Gnutella2 architecture makes widespread use of "deflate" compression, due to
its high availability and ease of integration. Support of compressed TCP links is not a
requirement in the Gnutella2 standard, however it is strongly recommended.

Deflate compression of a TCP link is negotiated with the pair standard HTTP headers
"Accept-Encoding" and "Content-Encoding". For example:

Header block one (initiator):

<pre>
<nowiki>
Accept-Encoding: deflate
</nowiki>
</pre>

Header block two (receiver):

<pre>
<nowiki>
Accept-Encoding: deflate
Content-Encoding: deflate
</nowiki>
</pre>

Header block three (initiator):

<pre>
<nowiki>
Content-Encoding: deflate
</nowiki>
</pre>

In this example the initiator advertises support for receiving a deflated connection.
The receiver then indicates that it too supports receiving a deflated connection, and
that it intends to transmit deflated data. Finally, the initiator upon noting that the
receiver supports deflate indicates that it too will transmit deflated data.

Note that unlike the content-type / protocol negotiation, deflated encoding can be
applied on either incoming, outgoing or both channels of a connection.

For performance reasons, nodes should consider whether they can afford to support
additional deflated connections before advertising support for them, or agreeing to
provide a deflated data stream. In the Gnutella2 network topology, all links benefit
from compression except the leaf to hub channel of the leaf/hub link. Exempting this
channel from compression saves the leaf and more importantly the hub a
considerable CPU and RAM investment.

== Post Handshake Communication ==

After the third and final header block has been received by the initiator, subsequent
communication over the TCP stream occurs in the negotiated protocol, with the
negotiated encoding. This means that while the handshake sequence was backwards
compatible with Gnutella1, after Gnutella2 support has been negotiated all
subsequent communication occurs in the Gnutella2 common protocol - an entirely
new system not backwards compatible with any other protocol.

TCP Stream Connection and Handshaking

2005-03-27T14:14:20Z

Kath: /* Handshaking */

== Introduction ==

TCP stream connections are established between Gnutella2 nodes when they elect to
form a permeant link, creating the fundamental network topology of a highly
interconnected hub network serving dense clusters of leaf nodes.

== Initiation ==

TCP connections are initiated by leaf or hub nodes in an attempt to gain a connection
to a hub node. Leaf nodes are never the target of an outbound connection. The TCP
port number is not standardised, and must be stored with the IP address.

== Handshaking ==

Upon the establishment of a TCP stream connection between two Gnutella2 nodes, a
handshaking phase must be completed to negotiate the nature of the link and
exchange other necessary information.

This handshaking phase is the only part of the communication which remains
compatible with the old Gnutella network, allowing new connections to be negotiated
without fore-knowledge the capabilities of the other node. The handshaking process
has been well documented elsewhere, however, a short summary is provided here.

=== Handshake Stages ===

The Gnutella handshake process consists of three header blocks. The node which
initiated the connection sends an initial header block, of the form:

<pre>
<nowiki>
GNUTELLA CONNECT/0.6
Listen-IP: 1.2.3.4:6346
Remote-IP: 6.7.8.9
User-Agent: Shareaza 1.8.2.0
Accept: application/x-gnutella2
X-Ultrapeer: False
</nowiki>
</pre>

The receiver then responds with its own header block:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Listen-IP: 6.7.8.9:6346
Remote-IP: 1.2.3.4
User-Agent: Shareaza 1.8.2.0
Content-Type: application/x-gnutella2
Accept: application/x-gnutella2
X-Ultrapeer: True
X-Ultrapeer-Needed: False
</nowiki>
</pre>

Finally, the initiator accepts the receiver's header block, and provides any final
information:

<pre>
<nowiki>
GNUTELLA/0.6 200 OK
Content-Type: application/x-gnutella2
X-Ultrapeer: False
The latter two stages may be replaced with an error condition if the connection is
being rejected. Appropriate error status codes are returned in this case, for example:
GNUTELLA/0.6 503 Too many connections
(more headers)
</nowiki>
</pre>

Note that only the HTTP-style error code should be interpreted by the software: any
descriptive text provided is for display purposes only and is not standardised.

== Headers ==

Important headers which are required or strongly recommended are detailed in the
following sections.

=== Addressing Headers ===

Two important headers to send on all connections are "Remote-IP" and "Listen-IP".
Both of these headers should be sent on the first transmission, meaning in the first
and second header blocks in the three block exchange.

The Remote-IP header contains the IP address from which the remote host is
connecting. This allows a remote host operating through some kind of network
address translation system to learn its effective external address.

The Listen-IP header contains the IP address and port number that the local host is
listening for inbound TCP connections on. It should be listening for UDP datagrams
on the same port. The format of this header is "IP:PORT", eg "1.2.3.4:6346".

=== Identification ===

The User-Agent header is used to identify the client software operating on the
sending node. It should be sent on the first transmission, meaning in the first and
second header blocks in the three block exchange. Note that this is a descriptive
string that often includes a version number, and is not a "vendor code" as described
elsewhere.

=== Content Type (Protocol) ===

The Accept and Content-Type header exchange is used to negotiate the data
protocol that will be used in the connection, in this case Gnutella2. The Gnutella2
content type is "application/x-gnutella2", and this exchange follows standard HTTP
rules for negotiating content type.

The first step is to advertise support for the content type (Gnutella2) in the first
header block with "Accept: application/x-gnutella2". The responding node will then
indicate that it will send Gnutella2 content with "Content-Type: application/xgnutella2",
and that it also supports Gnutella2 with "Accept: application/x-gnutella2".
The initiating host then confirms that it will be sending Gnutella2 with "Content-
Type: application/x-gnutella2" in the third header block. For more information on the
Accept/Content-Type exchange, consult a HTTP reference.

Note that the content type negotiation process is designed to be a "one way"
process, i.e. a different content type can be negotiated for sending and receiving.
However when the gnutella2 protocol is negotiated, both channels must use the
same content type. This means that a receiving node must not accept Gnutella2 if
the initiator did not advertise support for it, and if at the end of the handshake
bidirectional Gnutella2 was not negotiated, the connection should be terminated.

=== Node State Negotiation ===

There are two node types in a Gnutella2 peer to peer network, a hub and a leaf as
described in [[Node Types and Responsibilities]]. During the initial handshake
the two parties must exchange their current node type and advise of their
capabilities, negotiating the node types they will adopt when the connection
completes, and indeed whether it should complete at all.

As the handshake sequence is compatible with Gnutella1, the headers involved in
negotiating node types are identical to those used to negotiate Gnutella1 "Ultrapeer"
states:

<pre>
<nowiki>
X-Ultrapeer: [True|False]
X-Ultrapeer-Needed: [True|False]
</nowiki>
</pre>

Both headers contain a Boolean value, "true" or "false", case insensitive.

The X-Ultrapeer header indicates whether the transmitting node is currently
operating as a hub. Hub nodes will send "X-Ultrapeer: True" while leaf nodes will
send "X-Ultrapeer: False".

The X-Ultrapeer-Needed header indicates whether the transmitting node would like
(and allow) the receiver to be a hub. A hub which sees no need for additional hubs in
its area of the network will send "X-Ultrapeer-Needed: False", indicating to the
connecting node that it must operate in leaf mode if it wishes to connect. A hub
which sees a need for additional hubs will send "X-Ultrapeer-Needed: True",
indicating that the receiving node should become a hub if it is capable of doing so. A
leaf may send "X-Ultrapeer-Needed: True" to indicate that it is seeking a connection
to a hub.

The X-Ultrapeer header should be sent on all three of the header blocks to indicate
the current intended state of the node, while the X-Ultrapeer-Needed header should
be sent in the first two header blocks only to indicate the desired status of the
receiver. If the nodes cannot agree on a satisfactory arrangement, the connection
will be terminated at or prior to the third header block with an appropriate message,
for example:

<pre>
<nowiki>
GNUTELLA/0.6 503 Too many hub connections
GNUTELLA/0.6 503 Too many leaves
GNUTELLA/0.6 503 I have leaves, can't downgrade to leaf mode
GNUTELLA/0.6 503 Leaf mode disabled
</nowiki>
</pre>

=== Hub Address Exchange ===

It is desirable for connecting nodes to exchange the node addresses of other hubs on
the network to facilitate rapid connection. The Gnutella2 protocol includes highly
efficient methods to share hub addresses with peers once connected, but for a node
trying to connect and encountering only "full" hubs, learning new hubs to try is
helpful.

The X-Try-Ultrapeers header was developed for this purpose, and like the rest of
the handshake is also semi-compatible with Gnutella1. It contains a comma
separated list of hub node addresses and ports, along with a timestamp recording
the time the hub was last seen. For example:

<pre>
<nowiki>
X-Try-Ultrapeers: 1.2.3.4:6346 2003-03-25T23:59Z, [..more..]
</nowiki>
</pre>

Hub addresses should not be sent unless the transmitter has reasonable knowledge
of the hub's existence and the timestamp is not too old. The content type negotiation
headers should be used to verify that the listed hub addresses are indeed Gnutella2
hubs.

=== Compression Negotiation and Encoding ===

The Gnutella2 architecture makes widespread use of "deflate" compression, due to
its high availability and ease of integration. Support of compressed TCP links is not a
requirement in the Gnutella2 standard, however it is strongly recommended.

Deflate compression of a TCP link is negotiated with the pair standard HTTP headers
"Accept-Encoding" and "Content-Encoding". For example:

Header block one (initiator):

<pre>
<nowiki>
Accept-Encoding: deflate
</nowiki>
</pre>

Header block two (receiver):

<pre>
<nowiki>
Accept-Encoding: deflate
Content-Encoding: deflate
</nowiki>
</pre>

Header block three (initiator):

<pre>
<nowiki>
Content-Encoding: deflate
</nowiki>
</pre>

In this example the initiator advertises support for receiving a deflated connection.
The receiver then indicates that it too supports receiving a deflated connection, and
that it intends to transmit deflated data. Finally, the initiator upon noting that the
receiver supports deflate indicates that it too will transmit deflated data.

Note that unlike the content-type / protocol negotiation, deflated encoding can be
applied on either incoming, outgoing or both channels of a connection.

For performance reasons, nodes should consider whether they can afford to support
additional deflated connections before advertising support for them, or agreeing to
provide a deflated data stream. In the Gnutella2 network topology, all links benefit
from compression except the leaf to hub channel of the leaf/hub link. Exempting this
channel from compression saves the leaf and more importantly the hub a
considerable CPU and RAM investment.

== Post Handshake Communication ==

After the third and final header block has been received by the initiator, subsequent
communication over the TCP stream occurs in the negotiated protocol, with the
negotiated encoding. This means that while the handshake sequence was backwards
compatible with Gnutella1, after Gnutella2 support has been negotiated all
subsequent communication occurs in the Gnutella2 common protocol - an entirely
new system not backwards compatible with any other protocol.

Node Types and Responsibilities

2005-03-27T14:09:14Z

Kath: /* Hub Responsibilities */

== Introduction ==

The Gnutella2 network is an ad-hoc, self-organising collection of interconnected
nodes cooperating to enable productive distributed activities.

Not all of the nodes participating in the system are equal: there are two primary
node types, "hubs" and "leaves". The goal is to maximise the number of leaves
and minimise the number of hubs, however, due to the limited nature of resources
the maximum viable ratio of leaves to hubs is limited. This quantity is known as
the "leaf density".

== Nodes ==

=== Leaf Nodes ===

Leaf nodes are the most common node type on the network - they have no
special responsibilities and do not form a working part of the network
infrastructure. Nodes with limited resources must operate as leaf nodes: this
includes limited bandwidth, CPU or RAM, low or unpredictable expected uptime,
and inability to accept inbound TCP or UDP.

=== Hub Nodes ===

Hub nodes on the other hand form an important and active part of the network
infrastructure, organising surrounding nodes, filtering and directing traffic over
several media types. Hub nodes devote substantial resources to the network, and
as a result their capacity to participate in higher level network functions is limited.
Only the most capable nodes are selected to act as hubs, based upon the criteria
in the following section.

== Hub Selection Criteria ==

Hubs are selected based on the following internal criteria:

* Suitable operating system (able to support > 100 sockets)
* Suitably high CPU and RAM available
* Long uptime (many hours, at least two), possibly considering historical uptime
* Adequate bandwidth, primarily inbound bandwidth
* Ability to accept inbound TCP and UDP

In addition to these internal factors, hubs must also consider the network's need
for additional hubs. Without a central point of authority, the need for additional
hubs cannot be determined with absolute certainty; however, it can be
approximated by examining the state of nearby nodes and specifically the state of
hubs in the local hub cluster.

== Hub Responsibilities ==
Nodes operating as Gnutella2 hubs have a set of responsibilities to meet. Hubs
are highly interconnected, forming a "hub network" or "hub web", with each hub
maintaining a connection to 5-30 other "neighbouring" hubs. The number of hub
interconnections must scale up with the overall size of the network.

Each hub also accepts connections from a large collection of leaf nodes, typically
200-300 depending on available resources. Leaf nodes are considered to be the
"edge" of the network. In practice, leaves simultaneously connect to two hubs,
however, from the point of view of the hubs each leaf is considered a dead end.

The group of hubs within the hub network spanning the local hub and its
neighbours is termed the "hub cluster", and is an important grouping. Hub
clusters maintain constant communication with each other, sharing information
about network load and statistics, exchanging cache entries and filtering tables.
The hub cluster is also the smallest searchable unit of the network as far as a
search client is concerned.

Hub responsibilities include:

* Maintaining up-to-date information about other hubs in the cluster, and their neighbouring hubs, providing updates to neighbours
* Maintaining a node routing table mapping node GUIDs to shortest route local TCP connections and UDP endpoints
* Maintaining query hash tables for each connection, including both leaves and hubs so that queries can be executed intelligently
* Maintaining a superset query hash table including local content and every connected leaf's (not hub's) supplied tables, to supply to neighbouring hubs
* Monitoring the status of local connections and deciding whether to downgrade to leaf mode, and keeping distributed discovery services such as GWebCaches updated

Node Types and Responsibilities

2005-03-27T13:57:00Z

Kath: /* Hub Selection Criteria */

== Introduction ==

The Gnutella2 network is an ad-hoc, self-organising collection of interconnected
nodes cooperating to enable productive distributed activities.

Not all of the nodes participating in the system are equal: there are two primary
node types, "hubs" and "leaves". The goal is to maximise the number of leaves
and minimise the number of hubs, however, due to the limited nature of resources
the maximum viable ratio of leaves to hubs is limited. This quantity is known as
the "leaf density".

== Nodes ==

=== Leaf Nodes ===

Leaf nodes are the most common node type on the network - they have no
special responsibilities and do not form a working part of the network
infrastructure. Nodes with limited resources must operate as leaf nodes: this
includes limited bandwidth, CPU or RAM, low or unpredictable expected uptime,
and inability to accept inbound TCP or UDP.

=== Hub Nodes ===

Hub nodes on the other hand form an important and active part of the network
infrastructure, organising surrounding nodes, filtering and directing traffic over
several media types. Hub nodes devote substantial resources to the network, and
as a result their capacity to participate in higher level network functions is limited.
Only the most capable nodes are selected to act as hubs, based upon the criteria
in the following section.

== Hub Selection Criteria ==

Hubs are selected based on the following internal criteria:

* Suitable operating system (able to support > 100 sockets)
* Suitably high CPU and RAM available
* Long uptime (many hours, at least two), possibly considering historical uptime
* Adequate bandwidth, primarily inbound bandwidth
* Ability to accept inbound TCP and UDP

In addition to these internal factors, hubs must also consider the network's need
for additional hubs. Without a central point of authority, the need for additional
hubs cannot be determined with absolute certainty; however, it can be
approximated by examining the state of nearby nodes and specifically the state of
hubs in the local hub cluster.

== Hub Responsibilities ==
Nodes operating as Gnutella2 hubs have a set of responsibilities to meet. Hubs
are highly interconnected, forming a "hub network" or "hub web", with each hub
maintaining a connection to 5-30 other "neighbouring" hubs. The number of hub
interconnections must scale up with the overall size of the network.

Each hub also accepts connections from a large collection of leaf nodes, typically
200-300 depending on available resources. Leaf nodes are considered to be the
"edge" of the network. In practice leaves simultaneously connect to two hubs,
however from the point of view of the hubs each leaf is considered a dead end.

The group of hubs within the hub network spanning the local hub and its
neighbours is termed the "hub cluster", and is an important grouping. Hub
clusters maintain constant communication with each other, sharing information
about network load and statistics, exchanging cache entries and filtering tables.
The hub cluster is also the smallest searchable unit of the network as far as a
search client is concerned.

Hub responsibilities include:

* Maintaining up to date information about other hubs in the cluster, and their neighbouring hubs, providing updates to neighbours
* Maintaining a node routing table mapping node GUIDs to shortest route local TCP connections and UDP endpoints
* Maintaining query hash tables for each connection, including both leaves and hubs so that queries can be executed intelligently
* Maintaining a superset query hash table including local content and every connected leaf's (not hub's) supplied tables, to supply to neighbouring hubs
* Monitoring the status of local connections and deciding whether to downgrade to leaf mode, and keeping distributed discovery services such as GWebCaches updated

Node Types and Responsibilities

2005-03-27T13:53:21Z

Kath: /* Introduction */

== Introduction ==

The Gnutella2 network is an ad-hoc, self-organising collection of interconnected
nodes cooperating to enable productive distributed activities.

Not all of the nodes participating in the system are equal: there are two primary
node types, "hubs" and "leaves". The goal is to maximise the number of leaves
and minimise the number of hubs, however, due to the limited nature of resources
the maximum viable ratio of leaves to hubs is limited. This quantity is known as
the "leaf density".

== Nodes ==

=== Leaf Nodes ===

Leaf nodes are the most common node type on the network - they have no
special responsibilities and do not form a working part of the network
infrastructure. Nodes with limited resources must operate as leaf nodes: this
includes limited bandwidth, CPU or RAM, low or unpredictable expected uptime,
and inability to accept inbound TCP or UDP.

=== Hub Nodes ===

Hub nodes on the other hand form an important and active part of the network
infrastructure, organising surrounding nodes, filtering and directing traffic over
several media types. Hub nodes devote substantial resources to the network, and
as a result their capacity to participate in higher level network functions is limited.
Only the most capable nodes are selected to act as hubs, based upon the criteria
in the following section.

== Hub Selection Criteria ==

Hubs are selected based on the following internal criteria:

* Suitable operating system (able to support > 100 sockets)
* Suitably high CPU and RAM available
* Long uptime (many hours, at least two), possibly considering historical uptime
* Adequate bandwidth, primarily inbound bandwidth
* Ability to accept inbound TCP and UDP

In addition to these internal factors, hubs must also consider the network's need
for additional hubs. Without a central point of authority the need for additional
hubs cannot be determined with absolute certainty; however it can be
approximated by examining the state of nearby nodes and specifically the state of
hubs in the local hub cluster.

== Hub Responsibilities ==
Nodes operating as Gnutella2 hubs have a set of responsibilities to meet. Hubs
are highly interconnected, forming a "hub network" or "hub web", with each hub
maintaining a connection to 5-30 other "neighbouring" hubs. The number of hub
interconnections must scale up with the overall size of the network.

Each hub also accepts connections from a large collection of leaf nodes, typically
200-300 depending on available resources. Leaf nodes are considered to be the
"edge" of the network. In practice leaves simultaneously connect to two hubs,
however from the point of view of the hubs each leaf is considered a dead end.

The group of hubs within the hub network spanning the local hub and its
neighbours is termed the "hub cluster", and is an important grouping. Hub
clusters maintain constant communication with each other, sharing information
about network load and statistics, exchanging cache entries and filtering tables.
The hub cluster is also the smallest searchable unit of the network as far as a
search client is concerned.

Hub responsibilities include:

* Maintaining up to date information about other hubs in the cluster, and their neighbouring hubs, providing updates to neighbours
* Maintaining a node routing table mapping node GUIDs to shortest route local TCP connections and UDP endpoints
* Maintaining query hash tables for each connection, including both leaves and hubs so that queries can be executed intelligently
* Maintaining a superset query hash table including local content and every connected leaf's (not hub's) supplied tables, to supply to neighbouring hubs
* Monitoring the status of local connections and deciding whether to downgrade to leaf mode, and keeping distributed discovery services such as GWebCaches updated

Simple Query Language and Metadata

2005-03-27T13:44:07Z

Kath: /* Generic Matching on Metadata */

== Introduction ==

An effective search system must provide a query language which is:

* Powerful
* Intuitive
* Natural

At the same time, it is desirable for the language to be reasonably easy to parse in
software.

Gnutella2 employs a simple query language that is familiar to users of web search
engines and allows most common criteria to be entered intuitively.

Query Language Definition

* Every search string is considered a list of words.
* Words are identified as sequences of alphanumeric characters. Other symbols and white space are ignored.
* In the basic case, every word in the list must appear one or more times in a matching string.
* Words may be marked as negative words or excluded words by prefixing them with a dash (-).
* In this case, every positive word in the list must appear and every negative word must not appear in a matching string.
* Words can be grouped together with quotes. The negation operator (-) may not appear inside a quoted string, but it may prefix a quoted string in which case the negation is applied to the quoted string as a whole.
* The words in a quoted string must appear in the same order in a matching string. Conversely, the words in a negated quoted string must not appear or must not appear in the same order in a matching string.

== Examples ==

<pre>
<nowiki>
Cat Dog
(matches strings with "cat" and "dog", in any order)

-Cat Dog
(matches strings with "dog" but not "cat", in any order)

-Cat -Dog
(matches strings with neither "cat" nor "dog", illegal in an external search as there are no positive words)

"cat dog"
(matches strings with "cat" followed by "dog", "cat dog" matches, "dog cat" does not)

-"cat dog"
(matches strings without "cat dog", "cat dog" does not match, "dog cat" does)

"cat dog" -fish
(matches strings with "cat dog" and without "fish")
</nowiki>
</pre>

== Metadata Searching ==

Searching metadata involves a set of specific rules:

* If a metadata schema is specified as search criteria, matching objects that have metadata must share the same schema
* Metadata can only be compared if criteria and object share the same schema
* Each data member (attribute or element) of metadata is compared separately
* Where a member is specified in the criteria but not in the object, the match fails
* Where a member is specified in the object but not in the criteria, the member is ignored
* The search criteria for each text/string member is in the simple query language defined above
* The search criteria for numeric members is range based, defined in a subsequent section.

== Numeric Range Matching ==

When comparing numeric values, the match function is specified by the search
criteria:

* If a value is specified, the value must match exactly
* If a range (X-Y) is specified, the value must lie within that inclusive range
* If (X-) is specified, the value must be greater than or equal to X
* If (-X) is specified, the value must be less than or equal to X

== Generic Matching on Metadata ==

Generic search criteria (/Q2/DN) can be matched against metadata fields, however,
care must be taken not to match against data members which are not directly
descriptive to the object. Client-side schema descriptors are a good solution, listing
the scope of a general search on each recognised schema.

Main Page

2005-03-27T13:19:21Z

Kath: /* What is Gnutella2? */

== What is Gnutella2? ==

Gnutella2 is a modern and efficient peer-to-peer network standard and architecture
designed to provide a solid foundation for distributed global services such as person
to person communication, data location & transfer, and other future services.

== Why is it needed? ==

Peer to peer technologies have become mainstream over recent years, and there are
already a significant number of P2P networks in various stages of development and
operation. How does yet another network help?

Gnutella2 is unique amongst the currently operating peer to peer networks in several
important ways:

Many of the most successful networks are "closed", owned by a single entity with
restrictions or fees constituting a barrier to participation. This is not a viable model
for an open, general purpose network. Gnutella2 is an open architecture where
anyone is welcome to participate and contribute. The network has been designed to
allow such diversity without the need for messy hacks or compromises in integrity.
The majority of networks are devoted to a single purpose, often the sharing of files.
This is certainly a popular application for peer to peer technology, but it is by no
means the only application. Gnutella2 is designed as a general purpose network
which can be used as a solid foundation for any number of different peer to peer
applications - vanilla file sharing, communications tools or other ideas which are yet
to be conceived.

Some peer to peer networks have been developed with similar general purpose
goals, however, they have been unable to compete in the most popular application of
the day, which is file sharing. For a general purpose network to succeed, it must be
able to compete with purpose-specific networks in the most popular purpose.
Gnutella2 is not only able to compete with the current popular file sharing specific
networks, it outperforms them.

== What is the Scope of Gnutella2? ==

The single name "Gnutella2" really refers to two separate components: the [[Gnutella2 Standard]] and the [[Gnutella2 Network]].

The Gnutella2 Network is perhaps the most easily recognised component. It is a new
high-performance peer to peer network architecture upon which a variety of
distributed applications can be built, such as file sharing applications, communication
tools, etc.

The Gnutella2 Standard is a set of requirements for building applications which
operate on the Gnutella2 network in different capacities. It specifies the minimum
compliance level required to be recognised as a Gnutella2-compatible application.
Compliance with a Gnutella2 Standard ensures participating applications provide a
minimum acceptable level of service to other network participants.

== Indices ==

* [[Main index]]
* [[Root packets index]]