doc/spinel-protocol-src/spinel-data-packing.md - nest-detect/1.0/openthread - Git at Google

 # Data Packing

 Data serialization for properties is performed using a light-weight
 data packing format which was loosely inspired by D-Bus. The format of
 a serialization is defined by a specially formatted string.

 This packing format is used for notational convenience. While this
 string-based datatype format has been designed so that the strings may
 be directly used by a structured data parser, such a thing is not
 required to implement Spinel. Indeed, higly constrained applications
 may find such a thing to be too heavyweight.

 Goals:

  *  Be lightweight and favor direct representation of values.
  *  Use an easily readable and memorable format string.
  *  Support lists and structures.
  *  Allow properties to be appended to structures while maintaining
     backward compatibility.

 Each primitive datatype has an ASCII character associated with it.
 Structures can be represented as strings of these characters. For
 example:

  *  `C`: A single unsigned byte.
  *  `C6U`: A single unsigned byte, followed by a 128-bit IPv6
     address, followed by a zero-terminated UTF8 string.
  *  `A(6)`: An array of concatenated IPv6 addresses

 In each case, the data is represented exactly as described. For
 example, an array of 10 IPv6 address is stored as 160 bytes.

 ## Primitive Types

 Char | Name                | Description
 -----|:--------------------|:------------------------------
  `.` | DATATYPE_VOID        | Empty data type. Used internally.
  `b` | DATATYPE_BOOL        | Boolean value. Encoded in 8-bits as either 0x00 or 0x01. All other values are illegal.
  `C` | DATATYPE_UINT8       | Unsigned 8-bit integer.
  `c` | DATATYPE_INT8        | Signed 8-bit integer.
  `S` | DATATYPE_UINT16      | Unsigned 16-bit integer.
  `s` | DATATYPE_INT16       | Signed 16-bit integer.
  `L` | DATATYPE_UINT32      | Unsigned 32-bit integer.
  `l` | DATATYPE_INT32       | Signed 32-bit integer.
  `i` | DATATYPE_UINT_PACKED | Packed Unsigned Integer. See (#packed-unsigned-integer).
  `6` | DATATYPE_IPv6ADDR    | IPv6 Address. (Big-endian)
  `E` | DATATYPE_EUI64       | EUI-64 Address. (Big-endian)
  `e` | DATATYPE_EUI48       | EUI-48 Address. (Big-endian)
  `D` | DATATYPE_DATA        | Arbitrary data. See (#data-blobs).
  `d` | DATATYPE_DATA_WLEN   | Arbitrary data with prepended length. See (#data-blobs).
  `U` | DATATYPE_UTF8        | Zero-terminated UTF8-encoded string.
  `t(...)` | DATATYPE_STRUCT | Structured datatype with prepended length. See (#structured-data).
  `A(...)` | DATATYPE_ARRAY  | Array of datatypes. Compound type. See (#arrays).

 All multi-byte values are little-endian unless explicitly stated
 otherwise.

 ## Packed Unsigned Integer

 For certain types of integers, such command or property identifiers,
 usually have a value on the wire that is less than 127. However, in
 order to not preclude the use of values larger than 255, we would need
 to add an extra byte. Doing this would add an extra byte to the
 majority of instances, which can add up in terms of bandwidth.

 The packed unsigned integer format is based on the [unsigned integer
 format in EXI][EXI], except that we limit the maximum value to the
 largest value that can be encoded into three bytes(2,097,151).

 [EXI]: https://www.w3.org/TR/exi/#encodingUnsignedInteger

 For all values less than 127, the packed form of the number is simply
 a single byte which directly represents the number. For values larger
 than 127, the following process is used to encode the value:

 1.  The unsigned integer is broken up into *n* 7-bit chunks and placed
     into *n* octets, leaving the most significant bit of each octet
     unused.
 2.  Order the octets from least-significant to most-significant.
     (Little-endian)
 3.  Clear the most significant bit of the most significant octet. Set
     the least significant bit on all other octets.

 Where *n* is the smallest number of 7-bit chunks you can use to
 represent the given value.

 Take the value 1337, for example:

     1337 => 0x0539
          => [39 0A]
          => [B9 0A]

 To decode the value, you collect the 7-bit chunks until you find an
 octet with the most significant bit clear.

 ## Data Blobs

 There are two types for data blobs: `d` and `D`.

 *   `d` has the length of the data (in bytes) prepended to the data
     (with the length encoded as type `S`). The size of the length
     field is not included in the length.
 *   `D` does not have a prepended length: the length of the data is
     implied by the bytes remaining to be parsed. It is an error for
     `D` to not be the last type in a type in a type signature.

 This dichotomy allows for more efficient encoding by eliminating
 redundency. If the rest of the buffer is a data blob, encoding the
 length would be redundant because we already know how many bytes are
 in the rest of the buffer.

 In some cases we use `d` even if it is the last field in a type signature.
 We do this to allow for us to be able to append additional fields
 to the type signature if necessary in the future. This is usually the
 case with embedded structs, like in the scan results.

 For example, let's say we have a buffer that is encoded with the
 datatype signature of `CLLD`. In this case, it is pretty easy to tell
 where the start and end of the data blob is: the start is 9 bytes from
 the start of the buffer, and its length is the length of the buffer
 minus 9. (9 is the number of bytes taken up by a byte and two longs)

 The datatype signature `CLLDU` is illegal because we can't determine
 where the last field (a zero-terminated UTF8 string) starts. But the
 datatype `CLLdU` *is* legal, because the parser can determine the
 exact length of the data blob-- allowing it to know where the start
 of the next field would be.

 ## Structured Data

 The structure data type (`t(...)`) is a way of bundling together
 several fields into a single structure. It can be thought of as a
 `d` type except that instead of being opaque, the fields in the
 content are known. This is useful for things like scan results where
 you have substructures which are defined by different layers.

 For example, consider the type signature `Lt(ES)t(6C)`. In this
 hypothetical case, the first struct is defined by the MAC layer, and
 the second struct is defined by the PHY layer. Because of the use of
 structures, we know exactly what part comes from that layer.
 Additionally, we can add fields to each structure without introducing
 backward compatability problems: Data encoded as `Lt(ESU)t(6C)` (Notice
 the extra `U`) will
 decode just fine as `Lt(ES)t(6C)`. Additionally, if we don't care
 about the MAC layer and only care about the network layer, we could
 parse as `Lt()t(6C)`.

 Note that data encoded as `Lt(ES)t(6C)` will also parse as `Ldd`,
 with the structures from both layers now being opaque data blobs.

 ## Arrays

 An array is simply a concatenated set of *n* data encodings. For example,
 the type `A(6)` is simply a list of IPv6 addresses---one after the other.
 The type `A(6E)` likewise a concatenation of IPv6-address/EUI-64 pairs.

 If an array contains many fields, the fields will often be surrounded
 by a structure (`t(...)`). This effectively prepends each item in the
 array with its length. This is useful for improving parsing performance
 or to allow additional fields to be added in the future in a backward
 compatible way. If there is a high certainty that additional
 fields will never be added, the struct may be omitted (saving two bytes
 per item).

 This specification does not define a way to embed an array as a field
 alongside other fields.
	# Data Packing

	Data serialization for properties is performed using a light-weight
	data packing format which was loosely inspired by D-Bus. The format of
	a serialization is defined by a specially formatted string.

	This packing format is used for notational convenience. While this
	string-based datatype format has been designed so that the strings may
	be directly used by a structured data parser, such a thing is not
	required to implement Spinel. Indeed, higly constrained applications
	may find such a thing to be too heavyweight.

	Goals:

	* Be lightweight and favor direct representation of values.
	* Use an easily readable and memorable format string.
	* Support lists and structures.
	* Allow properties to be appended to structures while maintaining
	backward compatibility.

	Each primitive datatype has an ASCII character associated with it.
	Structures can be represented as strings of these characters. For
	example:

	* `C`: A single unsigned byte.
	* `C6U`: A single unsigned byte, followed by a 128-bit IPv6
	address, followed by a zero-terminated UTF8 string.
	* `A(6)`: An array of concatenated IPv6 addresses

	In each case, the data is represented exactly as described. For
	example, an array of 10 IPv6 address is stored as 160 bytes.

	## Primitive Types

	Char \| Name \| Description
	-----\|:--------------------\|:------------------------------
	`.` \| DATATYPE_VOID \| Empty data type. Used internally.
	`b` \| DATATYPE_BOOL \| Boolean value. Encoded in 8-bits as either 0x00 or 0x01. All other values are illegal.
	`C` \| DATATYPE_UINT8 \| Unsigned 8-bit integer.
	`c` \| DATATYPE_INT8 \| Signed 8-bit integer.
	`S` \| DATATYPE_UINT16 \| Unsigned 16-bit integer.
	`s` \| DATATYPE_INT16 \| Signed 16-bit integer.
	`L` \| DATATYPE_UINT32 \| Unsigned 32-bit integer.
	`l` \| DATATYPE_INT32 \| Signed 32-bit integer.
	`i` \| DATATYPE_UINT_PACKED \| Packed Unsigned Integer. See (#packed-unsigned-integer).
	`6` \| DATATYPE_IPv6ADDR \| IPv6 Address. (Big-endian)
	`E` \| DATATYPE_EUI64 \| EUI-64 Address. (Big-endian)
	`e` \| DATATYPE_EUI48 \| EUI-48 Address. (Big-endian)
	`D` \| DATATYPE_DATA \| Arbitrary data. See (#data-blobs).
	`d` \| DATATYPE_DATA_WLEN \| Arbitrary data with prepended length. See (#data-blobs).
	`U` \| DATATYPE_UTF8 \| Zero-terminated UTF8-encoded string.
	`t(...)` \| DATATYPE_STRUCT \| Structured datatype with prepended length. See (#structured-data).
	`A(...)` \| DATATYPE_ARRAY \| Array of datatypes. Compound type. See (#arrays).

	All multi-byte values are little-endian unless explicitly stated
	otherwise.

	## Packed Unsigned Integer

	For certain types of integers, such command or property identifiers,
	usually have a value on the wire that is less than 127. However, in
	order to not preclude the use of values larger than 255, we would need
	to add an extra byte. Doing this would add an extra byte to the
	majority of instances, which can add up in terms of bandwidth.

	The packed unsigned integer format is based on the [unsigned integer
	format in EXI][EXI], except that we limit the maximum value to the
	largest value that can be encoded into three bytes(2,097,151).

	[EXI]: https://www.w3.org/TR/exi/#encodingUnsignedInteger

	For all values less than 127, the packed form of the number is simply
	a single byte which directly represents the number. For values larger
	than 127, the following process is used to encode the value:

	1. The unsigned integer is broken up into n 7-bit chunks and placed
	into n octets, leaving the most significant bit of each octet
	unused.
	2. Order the octets from least-significant to most-significant.
	(Little-endian)
	3. Clear the most significant bit of the most significant octet. Set
	the least significant bit on all other octets.

	Where n is the smallest number of 7-bit chunks you can use to
	represent the given value.

	Take the value 1337, for example:

	1337 => 0x0539
	=> [39 0A]
	=> [B9 0A]

	To decode the value, you collect the 7-bit chunks until you find an
	octet with the most significant bit clear.

	## Data Blobs

	There are two types for data blobs: `d` and `D`.

	* `d` has the length of the data (in bytes) prepended to the data
	(with the length encoded as type `S`). The size of the length
	field is not included in the length.
	* `D` does not have a prepended length: the length of the data is
	implied by the bytes remaining to be parsed. It is an error for
	`D` to not be the last type in a type in a type signature.

	This dichotomy allows for more efficient encoding by eliminating
	redundency. If the rest of the buffer is a data blob, encoding the
	length would be redundant because we already know how many bytes are
	in the rest of the buffer.

	In some cases we use `d` even if it is the last field in a type signature.
	We do this to allow for us to be able to append additional fields
	to the type signature if necessary in the future. This is usually the
	case with embedded structs, like in the scan results.

	For example, let's say we have a buffer that is encoded with the
	datatype signature of `CLLD`. In this case, it is pretty easy to tell
	where the start and end of the data blob is: the start is 9 bytes from
	the start of the buffer, and its length is the length of the buffer
	minus 9. (9 is the number of bytes taken up by a byte and two longs)

	The datatype signature `CLLDU` is illegal because we can't determine
	where the last field (a zero-terminated UTF8 string) starts. But the
	datatype `CLLdU` is legal, because the parser can determine the
	exact length of the data blob-- allowing it to know where the start
	of the next field would be.

	## Structured Data

	The structure data type (`t(...)`) is a way of bundling together
	several fields into a single structure. It can be thought of as a
	`d` type except that instead of being opaque, the fields in the
	content are known. This is useful for things like scan results where
	you have substructures which are defined by different layers.

	For example, consider the type signature `Lt(ES)t(6C)`. In this
	hypothetical case, the first struct is defined by the MAC layer, and
	the second struct is defined by the PHY layer. Because of the use of
	structures, we know exactly what part comes from that layer.
	Additionally, we can add fields to each structure without introducing
	backward compatability problems: Data encoded as `Lt(ESU)t(6C)` (Notice
	the extra `U`) will
	decode just fine as `Lt(ES)t(6C)`. Additionally, if we don't care
	about the MAC layer and only care about the network layer, we could
	parse as `Lt()t(6C)`.

	Note that data encoded as `Lt(ES)t(6C)` will also parse as `Ldd`,
	with the structures from both layers now being opaque data blobs.

	## Arrays

	An array is simply a concatenated set of n data encodings. For example,
	the type `A(6)` is simply a list of IPv6 addresses---one after the other.
	The type `A(6E)` likewise a concatenation of IPv6-address/EUI-64 pairs.

	If an array contains many fields, the fields will often be surrounded
	by a structure (`t(...)`). This effectively prepends each item in the
	array with its length. This is useful for improving parsing performance
	or to allow additional fields to be added in the future in a backward
	compatible way. If there is a high certainty that additional
	fields will never be added, the struct may be omitted (saving two bytes
	per item).

	This specification does not define a way to embed an array as a field
	alongside other fields.