blob: a222e73d28b8fa93ad4b44b4f7885f6c008dbd16 [file] [log] [blame] [view] [edit]
# Data Packing
Data serialization for properties is performed using a light-weight
data packing format which was loosely inspired by D-Bus. The format of
a serialization is defined by a specially formatted string.
This packing format is used for notational convenience. While this
string-based datatype format has been designed so that the strings may
be directly used by a structured data parser, such a thing is not
required to implement Spinel. Indeed, higly constrained applications
may find such a thing to be too heavyweight.
Goals:
* Be lightweight and favor direct representation of values.
* Use an easily readable and memorable format string.
* Support lists and structures.
* Allow properties to be appended to structures while maintaining
backward compatibility.
Each primitive datatype has an ASCII character associated with it.
Structures can be represented as strings of these characters. For
example:
* `C`: A single unsigned byte.
* `C6U`: A single unsigned byte, followed by a 128-bit IPv6
address, followed by a zero-terminated UTF8 string.
* `A(6)`: An array of concatenated IPv6 addresses
In each case, the data is represented exactly as described. For
example, an array of 10 IPv6 address is stored as 160 bytes.
## Primitive Types
Char | Name | Description
-----|:--------------------|:------------------------------
`.` | DATATYPE_VOID | Empty data type. Used internally.
`b` | DATATYPE_BOOL | Boolean value. Encoded in 8-bits as either 0x00 or 0x01. All other values are illegal.
`C` | DATATYPE_UINT8 | Unsigned 8-bit integer.
`c` | DATATYPE_INT8 | Signed 8-bit integer.
`S` | DATATYPE_UINT16 | Unsigned 16-bit integer.
`s` | DATATYPE_INT16 | Signed 16-bit integer.
`L` | DATATYPE_UINT32 | Unsigned 32-bit integer.
`l` | DATATYPE_INT32 | Signed 32-bit integer.
`i` | DATATYPE_UINT_PACKED | Packed Unsigned Integer. See (#packed-unsigned-integer).
`6` | DATATYPE_IPv6ADDR | IPv6 Address. (Big-endian)
`E` | DATATYPE_EUI64 | EUI-64 Address. (Big-endian)
`e` | DATATYPE_EUI48 | EUI-48 Address. (Big-endian)
`D` | DATATYPE_DATA | Arbitrary data. See (#data-blobs).
`d` | DATATYPE_DATA_WLEN | Arbitrary data with prepended length. See (#data-blobs).
`U` | DATATYPE_UTF8 | Zero-terminated UTF8-encoded string.
`t(...)` | DATATYPE_STRUCT | Structured datatype with prepended length. See (#structured-data).
`A(...)` | DATATYPE_ARRAY | Array of datatypes. Compound type. See (#arrays).
All multi-byte values are little-endian unless explicitly stated
otherwise.
## Packed Unsigned Integer
For certain types of integers, such command or property identifiers,
usually have a value on the wire that is less than 127. However, in
order to not preclude the use of values larger than 255, we would need
to add an extra byte. Doing this would add an extra byte to the
majority of instances, which can add up in terms of bandwidth.
The packed unsigned integer format is based on the [unsigned integer
format in EXI][EXI], except that we limit the maximum value to the
largest value that can be encoded into three bytes(2,097,151).
[EXI]: https://www.w3.org/TR/exi/#encodingUnsignedInteger
For all values less than 127, the packed form of the number is simply
a single byte which directly represents the number. For values larger
than 127, the following process is used to encode the value:
1. The unsigned integer is broken up into *n* 7-bit chunks and placed
into *n* octets, leaving the most significant bit of each octet
unused.
2. Order the octets from least-significant to most-significant.
(Little-endian)
3. Clear the most significant bit of the most significant octet. Set
the least significant bit on all other octets.
Where *n* is the smallest number of 7-bit chunks you can use to
represent the given value.
Take the value 1337, for example:
1337 => 0x0539
=> [39 0A]
=> [B9 0A]
To decode the value, you collect the 7-bit chunks until you find an
octet with the most significant bit clear.
## Data Blobs
There are two types for data blobs: `d` and `D`.
* `d` has the length of the data (in bytes) prepended to the data
(with the length encoded as type `S`). The size of the length
field is not included in the length.
* `D` does not have a prepended length: the length of the data is
implied by the bytes remaining to be parsed. It is an error for
`D` to not be the last type in a type in a type signature.
This dichotomy allows for more efficient encoding by eliminating
redundency. If the rest of the buffer is a data blob, encoding the
length would be redundant because we already know how many bytes are
in the rest of the buffer.
In some cases we use `d` even if it is the last field in a type signature.
We do this to allow for us to be able to append additional fields
to the type signature if necessary in the future. This is usually the
case with embedded structs, like in the scan results.
For example, let's say we have a buffer that is encoded with the
datatype signature of `CLLD`. In this case, it is pretty easy to tell
where the start and end of the data blob is: the start is 9 bytes from
the start of the buffer, and its length is the length of the buffer
minus 9. (9 is the number of bytes taken up by a byte and two longs)
The datatype signature `CLLDU` is illegal because we can't determine
where the last field (a zero-terminated UTF8 string) starts. But the
datatype `CLLdU` *is* legal, because the parser can determine the
exact length of the data blob-- allowing it to know where the start
of the next field would be.
## Structured Data
The structure data type (`t(...)`) is a way of bundling together
several fields into a single structure. It can be thought of as a
`d` type except that instead of being opaque, the fields in the
content are known. This is useful for things like scan results where
you have substructures which are defined by different layers.
For example, consider the type signature `Lt(ES)t(6C)`. In this
hypothetical case, the first struct is defined by the MAC layer, and
the second struct is defined by the PHY layer. Because of the use of
structures, we know exactly what part comes from that layer.
Additionally, we can add fields to each structure without introducing
backward compatability problems: Data encoded as `Lt(ESU)t(6C)` (Notice
the extra `U`) will
decode just fine as `Lt(ES)t(6C)`. Additionally, if we don't care
about the MAC layer and only care about the network layer, we could
parse as `Lt()t(6C)`.
Note that data encoded as `Lt(ES)t(6C)` will also parse as `Ldd`,
with the structures from both layers now being opaque data blobs.
## Arrays
An array is simply a concatenated set of *n* data encodings. For example,
the type `A(6)` is simply a list of IPv6 addresses---one after the other.
The type `A(6E)` likewise a concatenation of IPv6-address/EUI-64 pairs.
If an array contains many fields, the fields will often be surrounded
by a structure (`t(...)`). This effectively prepends each item in the
array with its length. This is useful for improving parsing performance
or to allow additional fields to be added in the future in a backward
compatible way. If there is a high certainty that additional
fields will never be added, the struct may be omitted (saving two bytes
per item).
This specification does not define a way to embed an array as a field
alongside other fields.