Packet Structure

From Gnutella2
Jump to navigation Jump to search

<< UDP Transceiver | Datatypes >> | Main Page


Introduction

All Gnutella2 communications are represented with Gnutella2 lightweight tree packets. This applies everywhere from TCP stream communications to reliable UDP transmissions to HTTP packet exchanges (where protocol data has been negotiated). Each tree packet may contain meaningful payload data and/or one or more child packets, allowing complex document structures to be created and extended in a backward compatible manner.

The concept can be compared to an XML document tree. The "packets" are elements, which can in turn contain zero or more child elements (packets). The payload of a packet is like the attributes of an XML element. However, serializing XML has a lot of overhead due to all the naming, even in a compact binary form. The Gnutella2 packet structure makes a compromise: it names elements (packets), allowing them to be globally recognized and understood, without knowledge of their format - and stores attributes as binary payloads, requiring knowledge of their content to parse them.

Thus the element (packet or child packet) is the finite unit of comprehension. This system provides an excellent trade-off between format transparency and compactness.

Fictitious Visual Example


+ Query Hit Packet
|
|-+ Node ID (standard)
|
|-+ Server Status (standard)
| \-+ Shareaza Server Status (private extension)
|
|-+ Hit Object
| |-+ URN (standard)
| |-+ Descriptive name (standard)
| | \-+ Alternate name list (extension)
| |-+ URL (standard)
| |-+ Priority indicator (private extension)
| | \-+ Digital signature (private)
| |-+ Alternate source summary (standard)
| \-+ Available ranges (standard)
| . \-+ Estimated completion time (private extension)
|
|-+ Selective digital signature (private)
|
\-+ Routing tags

Contents

Each Gnutella2 packet contains:

  • Control flags
  • A type name meaningful in the namespace of the packet's parent or context
  • A length (or implied length)
  • Payload data of a format specific to the packet type name and namespace
  • Child packets existing in the namespace of this packet

Namespace Considerations

Each packet contains a relative type name of up to 8 bytes in length, which are case sensitive. The packet type name is meaningful only in the namespace of the packet's parent, or in the absence of a parent, the context of the packet (e.g. root level TCP stream).

This means that, for example a packet "A" inside packet "X" is different to a packet "A" inside packet "Y". Packets are of the same type only if their fully qualified absolute type names are equal.

As a convention, when discussing packet type names, they will be noted in their absolute form with a URL style slash (/) separating each level. In the above example, the first packet is "/X/A" while the second is "/Y/A". It is clear now that the packets are of different types.

Packet type names can contain from 1 to 8 bytes inclusive, and none of these bytes may be a null (0). Community approved packets are by convention named with uppercase characters and digits, for example "PUSH". Private packet types are by convention named with lowercase characters and digits, prefixed with the vendor code of the owner, for example "RAZAclr2".

Framing

Packets are encoded with a single leading control byte, followed by one or more bytes of packet length, followed by one or more bytes of packet name/ID, followed by zero or more child packets (framed the same way), followed by zero or more bytes of payload:


| Control | Length_| Name___ | children and/or payload |

All packets can contain a payload only, children and a payload, children only, or nothing at all. The total length of the packet header (control, length and type name) cannot exceed 12 bytes and cannot be less than 2 bytes.

The Control Byte

The control byte is always non-zero. A zero control byte identifies the end of a stream of packets, and thus has special meaning. It should not be used in root packet streams (which do not end). Control bytes have the following format:


+----+----+----+----+----+----+----+----+
| 7    6  | 5    4    3  | 2  | 1  | 0  | Bit
+----+----+----+----+----+----+----+----+
| Len_Len | Name_Len - 1 | CF | BE | // |
+----+----+----+----+----+----+----+----+

  • Len_Len is the number of bytes in the length field of the packet, which immediately follows the control byte. There are two bits here which means the length field can be up to 3 bytes long. Len_Len? can be zero if the packet has zero length (no children and no payload), in which case there is no need to encode the length.
  • Name_Len is the number of bytes in the packet name field MINUS ONE, which follows the packet length field. There are three bits here which means that packet names can be 1 to 8 bytes long inclusive. Because a 0 here equates to one byte of name, unnamed packets are not possible.
  • The three least significant bits of the control byte are reserved for flags. They have the following meanings:
  • CF is the compound packet flag. If this bit is set, the packet contains one or more child packets. If not set, the packet does not contain any child packets. If the packet is of zero length, this flag is ignored.
  • BE is the big-endian packet flag. If set, all multi-byte values encoded in the packet and its children are encoded in big-endian byte order - including the length in the packet header.
  • Other bits are reserved.

The Length Field

The length field immediately follows the control byte, and can be 0 to 3 bytes long. Length bytes are stored in the byte order of the packet.

The length value includes the payload of this packet AND any child packets in their entirety. This is obviously needed so that the entire packet can be detected and acquired from a stream. The length does not include the header (control byte, length, and name). The length field precedes the name field to allow it to be read faster from a stream when acquiring packets.

The length field is in the byte order of the root packet.

The Type Name Field

The type name field immediately follows the length bytes, and can be from 1 to 8 bytes long. Its format is detailed in the previous section entitled "Namespace Considerations".

Child Packets

Child packets are only present if the "compound packet bit" is set in the control byte. If set, there is one or more child packet immediately following the end of the header. These child packets are included in the total length of their parent (along with the payload, which follows the child packets after a packet stream terminator).

Child packets are framed exactly the same way, with a control byte, length, name, children and/or payload. When the compound bit is set and the packet is not of zero length, the first child packet must exist. Subsequent child packets may also exist, and are read in sequentially in the same way that they are read from a root packet stream. The end of the child packet stream is signalled by the presence of a zero control byte, OR the end of the parent packet's length (in which case there is no payload). Including a terminating zero control byte when there is no payload is still valid, but unnecessary.

Payload

Payload may exist whenever the length field is non-zero. However, if the compound bit is set, one or more child packets must be read before the payload is reached. If there is no packet left after the end of the last child, there is no payload.

Notes on the Control Byte

Note that there are a number of "marker packet types", which have no children or payload. It is desirable to encode these in as small a space as possible, which means omitting the length field and setting the len_len bits to zero in the control byte. This creates a potential conflict, as the control byte itself may be zero if the type name is one byte long - and as noted above, a zero control byte has special meaning (end of packet stream). This must be avoided; luckily it is perfectly legal to set the compound packet flag (CF) on zero length packets, thus producing a non-zero control byte and the most compact packet possible.

The compound packet bit MUST be checked when decoding every packet. It should be done in low-level decoding code to avoid accidental omission. Do not assume that a packet will not have children - it might not now, but no packets are sterile. Anything could be augmented or extended in some unknown way in the future. If you are not interested in children, skip them (which is easy, you don't even need to recurs through their children).

Simple Packet Decoder in C


BYTE nInput = ReadNextByte();

if ( nInput == 0 ) return S_NO_MORE_CHILDREN;

BYTE nLenLen = ( nInput & 0xC0 ) >> 6;
BYTE nTypeLen = ( nInput & 0x38 ) >> 3;
BYTE nFlags = ( nInput & 0x07 );

BOOL bBigEndian = ( nFlags & 0x02 ) ? TRUE : FALSE;
BOOL bIsCompound = ( nFlags & 0x04 ) ? TRUE : FALSE;

ASSERT( ! bBigEndian );

DWORD nPacketLength = 0;

ReadBytes( (BYTE*)&nPacketLength, nLenLen );

CHAR szType[9];
ReadBytes( (BYTE*)szType, nTypeLen + 1 );
szType[ nTypeLen + 1 ] = 0;