diff --git a/tutorial.html b/tutorial.html new file mode 100644 index 0000000..9166c41 --- /dev/null +++ b/tutorial.html @@ -0,0 +1,399 @@ +--- +layout: default +title: Tutorial +--- + + +
+ We're going to look at some CBOR-encoded information a byte at a time. +
+a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | 45 | 17 | + |
Consider the following CBOR data stream, where each pair of hex digits represents one byte:
+ +That's a little scary, but it just corresponds to the following JavaScript object:
+ +{ + "standard": true, + "RFC": 7049, + "published": new Date(1382565143000) +}+ +
There is nothing special about JavaScipt with respect to CBOR. It should be usable in almost any programming language. JavaScript is merely convenient to describe the encoded objects in a more human-readable way.
+ +We'll walk you through how decoding works. First, read a single byte from the input, and look at the most sigificant three bits. They tell you what the "Major Type" is of the data item we're reading. These three bits can signal one of eight Major Types:
+ +Top 3 Bits | +Major Type | +Meaning | +Examples | +
---|---|---|---|
000 | 0 | Positive integer | 123 |
001 | 1 | Negative integer | -124 |
010 | 2 | Block of bytes | ![]() |
011 | 3 | String | "Hello!" |
100 | 4 | Array | [1,2] |
101 | 5 | Map | ["foo": 6] |
110 | 6 | Tag | new Date("2013-10-23T21:52:23Z") |
111 | 7 | Constant or floating point | null, 1.234 | +
The lower 5 bits are "Additional Information". The Additional Information either encodes small integer value (if it is less than 24), or tells us to read more bytes (if it is 24 or higher). Let's look at the first byte in the stream we're decoding:
+ +a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
The first byte says that what follows is a Map of name/value pairs. For the Map major type, the additional information tells us how many pairs of items there will be in the map. In this case, there will be 3 pairs, so we're going to have to read 6 more items: a name, a value, a name, a value, a name, and a value.
+ +Let's examine the next byte:
+ +a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
We see that this is an eight character long string; the next eight bytes in the input are that string:
+ +Since the string is length-counted, we didn't have to perform any further escape decoding. The string standard
is the first name in the map we're reading.
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
Let's keep going by reading another item, starting with its first byte:
+ +If a Major Type 7 has additional information less than 256, it's a constant. Here are the currently-defined constants (or, as CBOR calls them "simple values"):
+ +CBOR encoded (hex) | +CBOR encoded (binary) | +Additional Information | +Meaning | +
---|---|---|---|
f4 | 1111 0100 | 20 | False |
f5 | 1111 0101 | 21 | True |
f6 | 1111 0110 | 22 | Null |
f7 | 1111 0111 | 23 | Undefined |
Other values might be defined in the future. If you receive one you don't understand, feel free thow an error, ignore it, turn it into an integer, or whatever rule works best for your use case.
+ +Here, the value in the first name/value pair in the map we're reading has the value true
.
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
On to the next item! This should be getting a little more familiar now.
+ +a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
We're going to need to read three more bytes, and treat them as a UTF-8 string. The string RFC
is the name for the next name/value pair in the map.
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
What's the value associated with the RFC
name in the map? It's a positive integer, with additional information of 25:
This is the first time we've seen an additional information that is greater than 23.
+ +Additional Information | Meaning |
---|---|
0..23 | The corresponding number, 0-23 |
24 | Read one more byte, use the value of that byte |
25 | Read two more bytes, treat them as a network-order 16-bit integer |
26 | Read four more bytes, treat them as a network-order 32-bit integer |
27 | Read eight more bytes, treat them as a network-order 64-bit integer |
28 | RESERVED: throw an error |
29 | RESERVED: throw an error |
30 | RESERVED: throw an error |
31 | Indeterminite length |
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
Since the additional information is 25, we read two more bytes: 1b 89
. When we interpret them in network byte order as an integer, they decode to 7049
.
+
+
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
The last name in the map is a 9-byte string.
+ +a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
Here we see the name is published
:
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
The last value is identified by a "tag". Tags give semantic meaning to the following item. If your code doesn't support a particular tag number, it can safely parse the entire input byte stream if it chooses.
+ +Tag 1
corresponds to a Date. The item after the tag is an integer or floating point number of seconds since the epoch. Some other tag values that have been defined include:
Tag | +Item Types | +Meaning | +
---|---|---|
0 | UTF-8 string | Date/Time as string |
1 | float, integer | Date/Time from epoch |
32 | UTF-8 string | URI |
35 | UTF-8 String | Regular expression |
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
Let's read the tagged item, which starts with the byte 1a
:
a3 | 68 | 73 | 74 | 61 | 6e | 64 | 61 | 72 | 64 | f5 | 63 | 52 | 46 | 43 | 19 | .hstandard.cRFC. | |
1b | 89 | 69 | 70 | 75 | 62 | 6c | 69 | 73 | 68 | 65 | 64 | c1 | 1a | 52 | 68 | ..ipublished..Rh | 45 | 17 | E. | +
The tagged item is the four-byte integer 0x52684517
, which with the Date/Time tag indicates the point in time 1382565143 seconds since the epoch, or Wed, 23 Oct 2013 21:52:23 GMT
, the time when RFC 7409 was announced.
We have now successfully read 3 pairs of items that followed the original map code. Let's look at it all at once, in the diagnostic text mode defined for CBOR:
+ +{"standard": true, "RFC": 7049, "published": 1(1382565143)}+ +
CBOR also supports arrays (Major Type 4, followed by Additional Information number of items), floating point numbers (Major Type 7 followed by 2, 4, or 8 bytes of network-order IEEE754 additional information), and byte strings (Major Type 2 followed by Additional Information number of bytes), all of which should be straightforward for you to figure out now.