Data Packing

Data serialization for properties is performed using a light-weight data packing format which was loosely inspired by D-Bus. The format of a serialization is defined by a specially formatted string.

This packing format is used for notational convenience. While this string-based datatype format has been designed so that the strings may be directly used by a structured data parser, such a thing is not required to implement Spinel. Indeed, higly constrained applications may find such a thing to be too heavyweight.

Goals:

Be lightweight and favor direct representation of values.
Use an easily readable and memorable format string.
Support lists and structures.
Allow properties to be appended to structures while maintaining backward compatibility.

Each primitive datatype has an ASCII character associated with it. Structures can be represented as strings of these characters. For example:

C: A single unsigned byte.
C6U: A single unsigned byte, followed by a 128-bit IPv6 address, followed by a zero-terminated UTF8 string.
A(6): An array of concatenated IPv6 addresses

In each case, the data is represented exactly as described. For example, an array of 10 IPv6 address is stored as 160 bytes.

Primitive Types

Char	Name	Description
`.`	DATATYPE_VOID	Empty data type. Used internally.
`b`	DATATYPE_BOOL	Boolean value. Encoded in 8-bits as either 0x00 or 0x01. All other values are illegal.
`C`	DATATYPE_UINT8	Unsigned 8-bit integer.
`c`	DATATYPE_INT8	Signed 8-bit integer.
`S`	DATATYPE_UINT16	Unsigned 16-bit integer.
`s`	DATATYPE_INT16	Signed 16-bit integer.
`L`	DATATYPE_UINT32	Unsigned 32-bit integer.
`l`	DATATYPE_INT32	Signed 32-bit integer.
`i`	DATATYPE_UINT_PACKED	Packed Unsigned Integer. See (#packed-unsigned-integer).
`6`	DATATYPE_IPv6ADDR	IPv6 Address. (Big-endian)
`E`	DATATYPE_EUI64	EUI-64 Address. (Big-endian)
`e`	DATATYPE_EUI48	EUI-48 Address. (Big-endian)
`D`	DATATYPE_DATA	Arbitrary data. See (#data-blobs).
`d`	DATATYPE_DATA_WLEN	Arbitrary data with prepended length. See (#data-blobs).
`U`	DATATYPE_UTF8	Zero-terminated UTF8-encoded string.
`t(...)`	DATATYPE_STRUCT	Structured datatype with prepended length. See (#structured-data).
`A(...)`	DATATYPE_ARRAY	Array of datatypes. Compound type. See (#arrays).

All multi-byte values are little-endian unless explicitly stated otherwise.

Packed Unsigned Integer

For certain types of integers, such command or property identifiers, usually have a value on the wire that is less than 127. However, in order to not preclude the use of values larger than 255, we would need to add an extra byte. Doing this would add an extra byte to the majority of instances, which can add up in terms of bandwidth.

The packed unsigned integer format is based on the unsigned integer format in EXI, except that we limit the maximum value to the largest value that can be encoded into three bytes(2,097,151).

For all values less than 127, the packed form of the number is simply a single byte which directly represents the number. For values larger than 127, the following process is used to encode the value:

The unsigned integer is broken up into n 7-bit chunks and placed into n octets, leaving the most significant bit of each octet unused.
Order the octets from least-significant to most-significant. (Little-endian)
Clear the most significant bit of the most significant octet. Set the least significant bit on all other octets.

Where n is the smallest number of 7-bit chunks you can use to represent the given value.

Take the value 1337, for example:

1337 => 0x0539
     => [39 0A]
     => [B9 0A]

To decode the value, you collect the 7-bit chunks until you find an octet with the most significant bit clear.

Data Blobs

There are two types for data blobs: d and D.

d has the length of the data (in bytes) prepended to the data (with the length encoded as type S). The size of the length field is not included in the length.
D does not have a prepended length: the length of the data is implied by the bytes remaining to be parsed. It is an error for D to not be the last type in a type in a type signature.

This dichotomy allows for more efficient encoding by eliminating redundency. If the rest of the buffer is a data blob, encoding the length would be redundant because we already know how many bytes are in the rest of the buffer.

In some cases we use d even if it is the last field in a type signature. We do this to allow for us to be able to append additional fields to the type signature if necessary in the future. This is usually the case with embedded structs, like in the scan results.

For example, let's say we have a buffer that is encoded with the datatype signature of CLLD. In this case, it is pretty easy to tell where the start and end of the data blob is: the start is 9 bytes from the start of the buffer, and its length is the length of the buffer minus 9. (9 is the number of bytes taken up by a byte and two longs)

The datatype signature CLLDU is illegal because we can't determine where the last field (a zero-terminated UTF8 string) starts. But the datatype CLLdU is legal, because the parser can determine the exact length of the data blob-- allowing it to know where the start of the next field would be.

Structured Data

The structure data type (t(...)) is a way of bundling together several fields into a single structure. It can be thought of as a d type except that instead of being opaque, the fields in the content are known. This is useful for things like scan results where you have substructures which are defined by different layers.

For example, consider the type signature Lt(ES)t(6C). In this hypothetical case, the first struct is defined by the MAC layer, and the second struct is defined by the PHY layer. Because of the use of structures, we know exactly what part comes from that layer. Additionally, we can add fields to each structure without introducing backward compatability problems: Data encoded as Lt(ESU)t(6C) (Notice the extra U) will decode just fine as Lt(ES)t(6C). Additionally, if we don't care about the MAC layer and only care about the network layer, we could parse as Lt()t(6C).

Note that data encoded as Lt(ES)t(6C) will also parse as Ldd, with the structures from both layers now being opaque data blobs.

Arrays

An array is simply a concatenated set of n data encodings. For example, the type A(6) is simply a list of IPv6 addresses---one after the other. The type A(6E) likewise a concatenation of IPv6-address/EUI-64 pairs.

If an array contains many fields, the fields will often be surrounded by a structure (t(...)). This effectively prepends each item in the array with its length. This is useful for improving parsing performance or to allow additional fields to be added in the future in a backward compatible way. If there is a high certainty that additional fields will never be added, the struct may be omitted (saving two bytes per item).

This specification does not define a way to embed an array as a field alongside other fields.