Datatypes: Difference between revisions
No edit summary |
(No difference)
|
Revision as of 17:28, 19 March 2005
Introduction
The format of a packet payload is defined by the packet type and can consist of any binary data; however there are a number of conventions in place for serializing common datatypes.
Multi-Byte Integers
Multi-byte integers are serialized in the byte-order of the topmost packet. Little endian is the default byte-order; however big-endian byte order can be selected for those who want it.
Network/Node Addresses
A network or node address consists of a physical address and a port number, and are of variable length depending on the address family. In IPv4, a network/node address is six bytes long: 4 bytes for an IP address and 2 bytes for a port number as follows:
typedef struct { BYTE ip[4]; SHORT port; } IPV4_ENDPOINT;
Note that this is considered an array of 4 8-bit integers (bytes), followed by a 16-bit integer (short). Byte order does not affect bytes, but it will affect the 16-bit port number.
IPv6 addresses are longer and are not yet defined within the scope of Gnutella2, however applications should be aware that if the node address is not 6 bytes it is of a different address family.
GUIDs
Globally unique identifiers (GUIDs) are used to identify nodes on the network. GUIDs are serialized as an array of 16 bytes.
Strings
Strings are encoded with UTF-8 encoding and serialized as a zero-terminated sequence of 8 bit integers.
A zero character (0x00) marks the end of the string, however if the string data meets the end of the packet (or child packet) payload, the terminator is not required. This means that packets whose payload consists of a string do not need to include a zero string terminator and their payload length will be the byte length of the encoded string exactly.
UTF-8 encoding is required for all strings present in the packet payload. This means that 7-bit characters may be passed as-is, while extended characters are encoded with multi-byte sequences.
All applications must be able to parse UTF-8 encoded strings, however it is up to the individual application whether to store the string in Unicode or convert it to the a local code page for processing. In situations where a packet must be processed 'and' forwarded, the original packet must be forwarded rather than a regenerated version. This ensures that both locally unsupported encodings and packet extensions are preserved.
Applications should never send ANSI strings directly if they contain extended characters with the MSB set. These should be encoded with UTF-8. If this is not done, the decoding process may fail and the packet will be discarded or contain bogus information.