Datatypes: Difference between revisions

From Gnutella2
Jump to navigation Jump to search
(→‎Multi-Byte Integers: Added blurb about variable-length encoding)
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Packet_Structure|<< Packet Structure]] | [[Basic_Network_Maintenance|Basic Network Maintenance >>]] | [[Main_Page|Main Page]]
== Introduction ==
== Introduction ==


Line 8: Line 11:


Multi-byte integers are serialized in the byte-order of the topmost packet. Little endian
Multi-byte integers are serialized in the byte-order of the topmost packet. Little endian
is the default byte-order; however big-endian byte order can be selected for
is the default byte-order; however, big-endian byte order can be selected for
those who want it.
those who want it.
Some values can also be serialized with spurious zeroes stripped-off, which is called '''variable-length''' encoding.  This is suitable for values that are usually small, because it avoids transmitting extra zero bytes over the network.
A ''variable-length'' encoding of values less than 256 requires 1 single byte, values up to 65536 will require 2 bytes, and so on and so forth.  This is the type of encoding used for serializing the length of each G2 packet, for instance.


== Network/Node Addresses ==
== Network/Node Addresses ==


A network or node address consists of a physical address and a port number, and are
A network or node address consists of a physical address and a port number, and are
of variable length depending on the address family.
of variable length, depending on the address family.
In IPv4, a network/node address is six bytes long: 4 bytes for an IP address and 2
In IPv4, a network/node address is six bytes long: 4 bytes for an IP address and 2
bytes for a port number as follows:
bytes for a port number as follows:
Line 33: Line 40:


IPv6 addresses are longer and are not yet defined within the scope of Gnutella2,
IPv6 addresses are longer and are not yet defined within the scope of Gnutella2,
however applications should be aware that if the node address is not 6 bytes it is of
however, applications should be aware that if the node address is not 6 bytes, it is of
a different address family.
a different address family.


Line 46: Line 53:
sequence of 8 bit integers.
sequence of 8 bit integers.


A zero character (0x00) marks the end of the string, however if the string data meets
A zero character (0x00) marks the end of the string, however, if the string data meets
the end of the packet (or child packet) payload, the terminator is not required. This
the end of the packet (or child packet) payload, the terminator is not required. This
means that packets whose payload consists of a string do not need to include a zero
means that packets whose payload consists of a string, do not need to include a zero
string terminator and their payload length will be the byte length of the encoded
string terminator and their payload length will be the byte length of the encoded
string exactly.
string exactly.
Line 56: Line 63:
with multi-byte sequences.
with multi-byte sequences.


All applications must be able to parse UTF-8 encoded strings, however it is up to the
All applications must be able to parse UTF-8 encoded strings, however, it is up to the
individual application whether to store the string in Unicode or convert it to the a
individual application whether to store the string in Unicode, or convert it to the
local code page for processing. In situations where a packet must be processed 'and'
local code page for processing. In situations where a packet must be processed ''and''
forwarded, the original packet must be forwarded rather than a regenerated version.
forwarded, the original packet must be forwarded rather than a regenerated version.
This ensures that both locally unsupported encodings and packet extensions are
This ensures that both locally unsupported encodings and packet extensions are

Latest revision as of 16:19, 18 January 2014

<< Packet Structure | Basic Network Maintenance >> | Main Page


Introduction

The format of a packet payload is defined by the packet type and can consist of any binary data; however, there are a number of conventions in place for serializing common datatypes.

Multi-Byte Integers

Multi-byte integers are serialized in the byte-order of the topmost packet. Little endian is the default byte-order; however, big-endian byte order can be selected for those who want it.

Some values can also be serialized with spurious zeroes stripped-off, which is called variable-length encoding. This is suitable for values that are usually small, because it avoids transmitting extra zero bytes over the network.

A variable-length encoding of values less than 256 requires 1 single byte, values up to 65536 will require 2 bytes, and so on and so forth. This is the type of encoding used for serializing the length of each G2 packet, for instance.

Network/Node Addresses

A network or node address consists of a physical address and a port number, and are of variable length, depending on the address family. In IPv4, a network/node address is six bytes long: 4 bytes for an IP address and 2 bytes for a port number as follows:


typedef struct
{
BYTE ip[4];
SHORT port;
} IPV4_ENDPOINT;

Note that this is considered an array of 4 8-bit integers (bytes), followed by a 16-bit integer (short). Byte order does not affect bytes, but it will affect the 16-bit port number.

IPv6 addresses are longer and are not yet defined within the scope of Gnutella2, however, applications should be aware that if the node address is not 6 bytes, it is of a different address family.

GUIDs

Globally unique identifiers (GUIDs) are used to identify nodes on the network. GUIDs are serialized as an array of 16 bytes.

Strings

Strings are encoded with UTF-8 encoding and serialized as a zero-terminated sequence of 8 bit integers.

A zero character (0x00) marks the end of the string, however, if the string data meets the end of the packet (or child packet) payload, the terminator is not required. This means that packets whose payload consists of a string, do not need to include a zero string terminator and their payload length will be the byte length of the encoded string exactly.

UTF-8 encoding is required for all strings present in the packet payload. This means that 7-bit characters may be passed as-is, while extended characters are encoded with multi-byte sequences.

All applications must be able to parse UTF-8 encoded strings, however, it is up to the individual application whether to store the string in Unicode, or convert it to the local code page for processing. In situations where a packet must be processed and forwarded, the original packet must be forwarded rather than a regenerated version. This ensures that both locally unsupported encodings and packet extensions are preserved.

Applications should never send ANSI strings directly if they contain extended characters with the MSB set. These should be encoded with UTF-8. If this is not done, the decoding process may fail and the packet will be discarded or contain bogus information.