-
Notifications
You must be signed in to change notification settings - Fork 12
/
rfc8949-to-be.xml
3970 lines (3903 loc) · 199 KB
/
rfc8949-to-be.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
ipr="trust200902"
docName="draft-ietf-cbor-7049bis-16"
number="8949"
submissionType="IETF"
category="std"
consensus="yes"
obsoletes="7049"
updates=""
xml:lang="en"
tocInclude="true"
tocDepth="4"
sortRefs="true"
symRefs="true"
version="3">
<!-- xml2rfc v2v3 conversion 3.2.1 -->
<front>
<title abbrev="CBOR">Concise Binary Object Representation (CBOR)</title>
<seriesInfo name="RFC" value="8949"/>
<seriesInfo name="STD" value="94"/>
<author initials="C." surname="Bormann" fullname="Carsten Bormann">
<organization>Universität Bremen TZI</organization>
<address>
<postal>
<street>Postfach 330440</street>
<city>Bremen</city>
<code>D-28359</code>
<country>Germany</country>
</postal>
<phone>+49-421-218-63921</phone>
<email>cabo@tzi.org</email>
</address>
</author>
<author initials="P." surname="Hoffman" fullname="Paul Hoffman">
<organization>ICANN</organization>
<address>
<email>paul.hoffman@icann.org</email>
</address>
</author>
<date year="2020" month="December"/>
<area>Internet</area>
<keyword>parser</keyword>
<keyword>decoder</keyword>
<keyword>encoder</keyword>
<keyword>binary format</keyword>
<keyword>data interchange format</keyword>
<keyword>JSON</keyword>
<abstract>
<t>The Concise Binary Object Representation (CBOR) is a data format whose design
goals include the
possibility of extremely small code size, fairly small message size, and
extensibility without the
need for version negotiation. These design goals make it different from earlier
binary
serializations such as ASN.1 and MessagePack.</t>
<t> This document obsoletes RFC 7049, providing editorial improvements, new
details, and errata fixes while keeping full compatibility with
the interchange format of RFC 7049. It does not create a new version
of the format. </t>
</abstract>
</front>
<middle>
<section anchor="introduction" toc="default">
<name>Introduction</name>
<t>There are hundreds of standardized formats for binary representation
of structured data (also known as binary serialization formats). Of
those, some are for specific domains of information, while others are
generalized for arbitrary data. In the IETF, probably the best-known
formats in the latter category are ASN.1's BER and DER <xref target="ASN.1" format="default"/>.</t>
<t>The format defined here follows some specific design goals that are
not well met by current formats. The underlying data model is an
extended version of the JSON data model <xref target="RFC8259" format="default"/>. It is important
to note that this is not a proposal that the grammar in RFC 8259 be
extended in general, since doing so would cause a significant
backwards incompatibility with already deployed JSON
documents. Instead, this document simply defines its own data model
that starts from JSON.</t>
<t><xref target="comparison-app" format="default"/> lists some existing binary formats and discusses
how well they do or do not fit the design objectives of the Concise
Binary Object Representation (CBOR).</t>
<t> This document obsoletes <xref target="RFC7049" format="default"/>, providing editorial improvements, new
details, and errata fixes while keeping full compatibility with
the interchange format of RFC 7049. It does not create a new version
of the format. </t>
<section anchor="objectives" toc="default">
<name>Objectives</name>
<t>The objectives of CBOR, roughly in decreasing order of importance,
are:</t>
<ol spacing="normal" type="1"><li>
<t>The representation must be able to unambiguously encode most common
data formats used in Internet standards. </t>
<ul spacing="normal">
<li>It must represent a reasonable set of basic data types and
structures using binary encoding. "Reasonable" here is largely
influenced by the capabilities of JSON, with the major addition
of binary byte strings. The structures supported are limited to
arrays and trees; loops and lattice-style graphs are not
supported.</li>
<li>There is no requirement that all data formats be uniquely
encoded; that is, it is acceptable that the number "7" might be
encoded in multiple different ways.</li>
</ul>
</li>
<li>
<t>The code for an encoder or decoder must be able to be compact in
order to support systems with very limited memory, processor power,
and instruction sets. </t>
<ul spacing="normal">
<li>An encoder and a decoder need to be implementable in a very
small amount of code (for example, in class 1 constrained nodes
as defined in <xref target="RFC7228" format="default"/>).</li>
<li>The format should use contemporary machine representations of
data (for example, not requiring binary-to-decimal conversion).</li>
</ul>
</li>
<li>
<t>Data must be able to be decoded without a schema description. </t>
<ul spacing="normal">
<li>Similar to JSON, encoded data should be self-describing so that
a generic decoder can be written.</li>
</ul>
</li>
<li>
<t>The serialization must be reasonably compact, but data compactness
is secondary to code compactness for the encoder and decoder. </t>
<ul spacing="normal">
<li>"Reasonable" here is bounded by JSON as an upper bound in
size and by the implementation complexity, which limits the
amount of effort that can go into achieving that compactness.
Using either general compression schemes or extensive
bit-fiddling violates the complexity goals.</li>
</ul>
</li>
<li>
<t>The format must be applicable to both constrained nodes and
high-volume applications. </t>
<ul spacing="normal">
<li>This means it must be reasonably frugal in CPU usage for both
encoding and decoding. This is relevant both for constrained
nodes and for potential usage in applications with a very high
volume of data.</li>
</ul>
</li>
<li>
<t>The format must support all JSON data types for conversion to and
from JSON. </t>
<ul spacing="normal">
<li>It must support a reasonable level of conversion as long as the
data represented is within the capabilities of JSON. It must be
possible to define a unidirectional mapping towards JSON for all
types of data.</li>
</ul>
</li>
<li>
<t>The format must be extensible, and the extended data must be
decodable by earlier decoders. </t>
<ul spacing="normal">
<li>The format is designed for decades of use.</li>
<li>The format must support a form of extensibility that allows
fallback so that a decoder that does not understand an extension
can still decode the message.</li>
<li>The format must be able to be extended in the future by later
IETF standards.</li>
</ul>
</li>
</ol>
</section>
<section anchor="terminology" toc="default">
<name>Terminology</name>
<t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
"<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
"<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119" format="default"/> <xref target="RFC8174" format="default"/>
when, and only when, they appear in all capitals, as shown here.
</t>
<t>The term "byte" is used in its now-customary sense as a synonym for
"octet". All multi-byte values are encoded in network byte order (that
is, most significant byte first, also known as "big-endian").</t>
<t>This specification makes use of the following terminology:</t>
<dl newline="false" spacing="normal">
<dt>Data item:</dt>
<dd>
A single piece of CBOR data. The structure of a data item may
contain zero, one, or more nested data items. The term is used both
for the data item in representation format and for the abstract idea
that can be derived from that by a decoder; the former can be
addressed specifically by using the term "encoded data item".</dd>
<dt>Decoder:</dt>
<dd>
A process that decodes a well-formed encoded CBOR data item and makes it available to an
application. Formally speaking, a decoder contains a parser to
break up the input using the syntax rules of CBOR, as well as a
semantic processor to prepare the data in a form suitable to the
application.</dd>
<dt>Encoder:</dt>
<dd>
A process that generates the (well-formed) representation format of a CBOR data
item from application information.</dd>
<dt>Data Stream:</dt>
<dd>
A sequence of zero or more data items, not further assembled into a
larger containing data item (see <xref target="RFC8742" format="default"/> for one application).
The independent data items that make
up a data stream are sometimes also referred to as "top-level data
items".</dd>
<dt>Well-formed:</dt>
<dd>
A data item that follows the syntactic structure of CBOR. A
well-formed data item uses the initial bytes and the byte strings
and/or data items that are implied by their values as defined in
CBOR and does not include following extraneous data. CBOR decoders
by definition only return contents from well-formed data items.</dd>
<dt>Valid:</dt>
<dd>
A data item that is well-formed and also follows the semantic
restrictions that apply to CBOR data items (<xref target="semantic-errors" format="default"/>).</dd>
<dt>Expected:</dt>
<dd>
Besides its normal English meaning, the term "expected" is used to
describe requirements beyond CBOR validity that an application has
on its input data. Well-formed (processable at all), valid (checked
by a validity-checking generic decoder), and expected (checked by the
application) form a hierarchy of layers of acceptability.</dd>
<dt>Stream decoder:</dt>
<dd>
A process that decodes a data stream and makes each of the data
items in the sequence available to an application as they are
received.</dd>
</dl>
<t>Terms and concepts for floating-point values such as Infinity, NaN
(not a number), negative zero, and subnormal are defined in <xref target="IEEE754" format="default"/>.</t>
<!-- [rfced] [Cplusplus20] points to a draft pdf version of the
specification. There is an HTML version, which will update when
the specification is finalized. This version is paywalled:
https://www.iso.org/standard/79358.html
Current:
Examples and pseudocode assume that signed integers use two's
complement representation and that right shifts of signed integers
perform sign extension; these assumptions are also specified in
Sections 6.8.1 (basic.fundamental) and 7.6.7 (expr.shift) of the 2020
version of C++ (currently available as a final draft, [Cplusplus20]).
-->
<t>Where bit arithmetic or data types are explained, this document uses
the notation familiar from the programming language C <xref target="C" format="default"/>, except that
".." denotes a range that includes both ends given, and superscript
notation denotes exponentiation. For example, 2 to the power of 64 is
notated: 2<sup>64</sup>.
In the plain-text version of this specification, superscript notation
is not available and therefore is rendered by a surrogate notation.
That notation is not optimized for this RFC; it is unfortunately
ambiguous with C's exclusive-or (which is only used in the appendices,
which in turn do not use exponentiation) and requires circumspection
from the reader of the plain-text version.
</t>
<t>Examples and pseudocode
assume that signed integers use two's complement representation and
that right shifts of signed integers perform sign extension; these
assumptions are also specified in Sections 6.8.1 (basic.fundamental)
and 7.6.7 (expr.shift) of the 2020 version of C++ (currently available
as a final draft, <xref target="Cplusplus20" format="default"/>).</t>
<t>Similar to the "0x" notation for
hexadecimal numbers, numbers in binary notation are prefixed with
"0b". Underscores can be added to a number solely for
readability, so 0b00100001 (0x21) might be written 0b001_00001 to
emphasize the desired interpretation of the bits in the byte; in this
case, it is split into three bits and five bits. Encoded CBOR data
items are sometimes given in the "0x" or "0b" notation; these values
are first interpreted as numbers as in C and are then interpreted as
byte strings in network byte order, including any leading zero bytes
expressed in the notation.</t>
<t>Words may be <em>italicized</em> for emphasis; in the plain text
form of this specification, this is indicated by surrounding words
with underscore characters. Verbatim text (e.g., names from a
programming language) may be set in <tt>monospace</tt> type; in plain
text, this is approximated somewhat ambiguously by surrounding the
text in double quotes (which also retain their usual meaning).</t>
</section>
</section>
<section anchor="cbor-data-models" toc="default">
<name>CBOR Data Models</name>
<t>CBOR is explicit about its generic data model, which defines the set
of all data items that can be represented in CBOR. Its basic generic
data model is extensible by the registration of "simple values" and
tags. Applications can then create a subset of the resulting extended generic
data model to build their specific data models.</t>
<t>Within environments that can represent the data items in the generic
data model, generic CBOR encoders and decoders can be implemented
(which usually involves defining additional implementation data types
for those data items that do not already have a natural representation
in the environment). The ability to provide generic encoders and
decoders is an explicit design goal of CBOR; however, many applications
will provide their own application-specific encoders and/or decoders.</t>
<t>In the basic (unextended) generic data model defined in
<xref target="encoding" format="default"/>, a data item is one of
the following:</t>
<ul spacing="normal">
<li>an integer in the range -2<sup>64</sup>..2<sup>64</sup>-1 inclusive</li>
<li>a simple value, identified by a number
between 0 and 255, but distinct from that number itself</li>
<li>a floating-point value, distinct from an integer, out of the set
representable by IEEE 754 binary64 (including non-finites) <xref target="IEEE754" format="default"/></li>
<li>a sequence of zero or more bytes ("byte string")</li>
<li>a sequence of zero or more Unicode code points ("text string")</li>
<li>a sequence of zero or more data items ("array")</li>
<li>a mapping (mathematical function) from zero or more data items
("keys") each to a data item ("values"), ("map")</li>
<li>a tagged data item ("tag"), comprising a tag number (an integer in
the range 0..2<sup>64</sup>-1) and the tag content (a data item)</li>
</ul>
<t>Note that integer and floating-point values are distinct in this
model, even if they have the same numeric value.</t>
<t>Also note that serialization variants are not visible at the generic
data model level. This deliberate absence of visibility includes the number of bytes of the encoded
floating-point value. It also includes the choice of encoding for an "argument" (see
<xref target="encoding"/>) such as the encoding for an
integer, the encoding for the length of a text or byte string, the encoding for the number of elements
in an array or pairs in a map, or the encoding for a tag number.</t>
<section anchor="extended-generic-data-models" toc="default">
<name>Extended Generic Data Models</name>
<t>This basic generic data model has been extended in this document by the registration
of a number of simple values and tag numbers, such as:</t>
<ul spacing="normal">
<li>
<tt>false</tt>, <tt>true</tt>, <tt>null</tt>, and <tt>undefined</tt>
(simple values identified by 20..23, <xref target="fpnocont" format="default"/>)</li>
<li>integer and floating-point values with a larger range and precision
than the above (tag numbers 2 to 5, <xref target="tags" format="default"/>)</li>
<li>application data types such as a point in time or
date/time string defined in RFC 3339 (tag numbers 1 and 0, <xref target="tags" format="default"/>)</li>
</ul>
<t>Additional elements of the extended generic data model can be (and have
been) defined via the IANA registries created for CBOR. Even if such
an extension is unknown to a generic encoder or decoder, data items
using that extension can be passed to or from the application by
representing them at the application interface within the basic
generic data model, i.e., as generic simple values or
generic tags.</t>
<t>In other words, the basic generic data model is stable as defined in
this document, while the extended generic data model expands by the
registration of new simple values or tag numbers, but never shrinks.</t>
<t>While there is a strong expectation that generic encoders and decoders
can represent <tt>false</tt>, <tt>true</tt>, and <tt>null</tt> (<tt>undefined</tt> is intentionally
omitted) in the form appropriate for their programming environment,
the implementation of the data model extensions created by tags is truly
optional and a matter of implementation quality.</t>
</section>
<section anchor="specific-data-models" toc="default">
<name>Specific Data Models</name>
<t>The specific data model for a CBOR-based protocol usually takes a subset of the
extended generic data model and assigns application semantics to the
data items within this subset and its components.
When documenting such specific data models and specifying the types
of data items, it is preferable to identify the types by their
generic data model names ("negative integer", "array") instead of
referring to aspects of their CBOR representation ("major type 1",
"major type 4").</t>
<t> Specific data models can also specify value equivalency (including
values of different types) for the purposes of map keys and encoder freedom. For
example, in the generic data model, a valid map <bcp14>MAY</bcp14> have both <tt>0</tt> and
<tt>0.0</tt> as keys, and an encoder <bcp14>MUST NOT</bcp14> encode <tt>0.0</tt> as an integer
(major type 0, <xref target="majortypes" format="default"/>). However, if a specific data model
declares that floating-point and integer representations of integral
values are equivalent, using both map keys <tt>0</tt> and <tt>0.0</tt> in a single
map would be considered
duplicates, even while encoded as different major types, and so invalid; and an encoder could encode integral-valued
floats as integers or vice versa, perhaps to save encoded bytes.</t>
</section>
</section>
<section anchor="encoding" toc="default">
<name>Specification of the CBOR Encoding</name>
<t>A CBOR data item (<xref target="cbor-data-models" format="default"/>) is encoded to or decoded from
a byte string carrying a well-formed encoded data item as described in this section. The encoding is
summarized in <xref target="jumptable" format="default"/> in <xref target="jump-table" format="default"/>, indexed by the initial byte. An encoder <bcp14>MUST</bcp14> produce only well-formed
encoded data items. A decoder <bcp14>MUST NOT</bcp14> return a decoded data item when it
encounters input that is not a well-formed encoded CBOR data item (this does
not detract from the usefulness of diagnostic and recovery tools that
might make available some information from a damaged encoded CBOR data item).</t>
<t>The initial byte of each encoded data item contains both information
about the major type (the high-order 3 bits, described in
<xref target="majortypes" format="default"/>) and additional information (the low-order 5 bits).
With a few exceptions, the additional information's value
describes how to load an unsigned integer "argument":</t>
<dl newline="false" spacing="normal">
<dt>Less than 24:</dt>
<dd>
The argument's value is the value of the additional information.</dd>
<dt>24, 25, 26, or 27:</dt>
<dd>
The argument's value is held in the following 1, 2, 4, or 8 bytes,
respectively, in network byte order. For major type 7 and
additional information value 25, 26, 27, these bytes are not used as
an integer argument, but as a floating-point value (see
<xref target="fpnocont" format="default"/>).</dd>
<dt>28, 29, 30:</dt>
<dd>
These values are reserved for future additions to the CBOR format.
In the present version of CBOR, the encoded item is not well-formed.</dd>
<dt>31:</dt>
<dd>
No argument value is derived.
If the major type is 0, 1, or 6, the encoded item is not
well-formed. For major types 2 to 5, the item's length is
indefinite, and for major type 7, the byte does not constitute a data
item at all but terminates an indefinite-length item; all are
described in <xref target="indefinite" format="default"/>.</dd>
</dl>
<t>The initial byte and any additional bytes consumed to construct the
argument are collectively referred to as the <em>head</em> of the data item.</t>
<t>The meaning of this argument depends on the major type.
For example, in major type 0, the argument is the value of the data
item itself (and in major type 1, the value of the data item is
computed from the argument); in major type 2 and 3, it gives the length
of the string data in bytes that follow; and in major types 4 and 5, it is used to
determine the number of data items enclosed.</t>
<t>If the encoded sequence of bytes ends before the end of a data item,
that item is not well-formed. If the encoded
sequence of bytes still has bytes remaining
after the outermost encoded item is decoded, that encoding is not a
single well-formed CBOR item. Depending on the application, the decoder may either
treat the encoding as not well-formed or just identify the start of
the remaining bytes to the application.</t>
<t>A CBOR decoder implementation can be based on a jump table with all
256 defined values for the initial byte (<xref target="jumptable" format="default"/>). A decoder in
a constrained implementation can instead use the structure of the
initial byte and following bytes for more compact code (see
<xref target="pseudocode" format="default"/> for a rough impression of how this could look).</t>
<section anchor="majortypes" toc="default">
<name>Major Types</name>
<t>The following lists the major types and the additional information and
other bytes associated with the type.</t>
<dl newline="true" spacing="normal">
<dt>Major type 0:</dt>
<dd>
An unsigned integer in the range 0..2<sup>64</sup>-1 inclusive. The value of the
encoded item is the argument itself. For example, the
integer 10 is denoted as the one byte 0b000_01010 (major type 0,
additional information 10). The integer 500 would be 0b000_11001
(major type 0, additional information 25) followed by the two bytes
0x01f4, which is 500 in decimal.</dd>
<dt>Major type 1:</dt>
<dd>
A negative integer in the range -2<sup>64</sup>..-1 inclusive. The value of
the item is -1 minus the argument. For example, the integer
-500 would be 0b001_11001 (major type 1, additional information 25)
followed by the two bytes 0x01f3, which is 499 in decimal.</dd>
<dt>Major type 2:</dt>
<dd>
A byte string. The number of bytes in the string is equal to the
argument. For example, a byte
string whose length is 5 would have an initial byte of 0b010_00101
(major type 2, additional information 5 for the length), followed by
5 bytes of binary content. A byte string whose length is 500 would
have 3 initial bytes of 0b010_11001 (major type 2, additional
information 25 to indicate a two-byte length) followed by the two
bytes 0x01f4 for a length of 500, followed by 500 bytes of binary
content.</dd>
<dt>Major type 3:</dt>
<dd>
A text string (<xref target="cbor-data-models" format="default"/>) encoded as UTF-8
<xref target="RFC3629" format="default"/>. The number of bytes in the string is equal to the
argument. A string containing an invalid UTF-8 sequence is
well-formed but invalid (<xref target="terminology" format="default"/>). This type is provided for
systems that need to interpret or display human-readable text, and
allows the differentiation between unstructured bytes and text that
has a specified repertoire (that of Unicode) and encoding (UTF-8). In contrast to formats
such as JSON, the Unicode characters in this type are never
escaped. Thus, a newline character (U+000A) is always represented in
a string as the byte 0x0a, and never as the bytes 0x5c6e (the
characters "\" and "n") nor as 0x5c7530303061 (the characters "\",
"u", "0", "0", "0", and "a").</dd>
<dt>Major type 4:</dt>
<dd>
An array of data items. In other formats, arrays are also called lists, sequences, or
tuples (a "CBOR sequence" is something slightly different, though <xref target="RFC8742" format="default"/>).
The argument is the number of data items in the
array. Items in an
array do not need to all be of the same type. For example, an array
that contains 10 items of any type would have an initial byte of
0b100_01010 (major type 4, additional information 10 for the
length) followed by the 10 remaining items.</dd>
<dt>Major type 5:</dt>
<dd>
A map of pairs of data items. Maps are also called tables,
dictionaries, hashes, or objects (in JSON). A map is comprised of
pairs of data items, each pair consisting of a key that is
immediately followed by a value. The argument is the number
of <em>pairs</em> of data items in the map. For
example, a map that contains 9 pairs would have an initial byte of
0b101_01001 (major type 5, additional information 9 for the
number of pairs) followed by the 18 remaining items. The first item
is the first key, the second item is the first value, the third item
is the second key, and so on. Because items in a map come in pairs,
their total number is always even: a map that contains an odd
number of items (no value data present after the last key data item) is not well-formed.
A map that has duplicate keys may be
well-formed, but it is not valid, and thus it causes indeterminate
decoding; see also <xref target="map-keys" format="default"/>.</dd>
<dt>Major type 6:</dt>
<dd>
A tagged data item ("tag") whose tag number, an integer in the range
0..2<sup>64</sup>-1 inclusive, is the argument and
whose enclosed data item (<em>tag content</em>) is the single encoded data item that follows the head.
See <xref target="tags" format="default"/>.</dd>
<dt>Major type 7:</dt>
<dd>
Floating-point numbers and simple values, as well as the "break"
stop code. See <xref target="fpnocont" format="default"/>.</dd>
</dl>
<t>These eight major types lead to a simple table showing which of the
256 possible values for the initial byte of a data item are used
(<xref target="jumptable" format="default"/>).</t>
<t>In major types 6 and 7, many of the possible values are reserved for
future specification. See <xref target="ianacons" format="default"/> for more information on these
values.</t>
<t><xref target="major-type-table" format="default"/> summarizes the major types defined by CBOR,
ignoring <xref target="indefinite" format="default"/> for now. The number N in this table stands
for the argument.</t>
<table anchor="major-type-table" align="center">
<name>Overview over the Definite-Length Use of CBOR Major Types (N = Argument)</name>
<thead>
<tr>
<th align="left">Major Type</th>
<th align="left">Meaning</th>
<th align="left">Content</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">0</td>
<td align="left">unsigned integer N</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">1</td>
<td align="left">negative integer -1-N</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">byte string</td>
<td align="left">N bytes</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">text string</td>
<td align="left">N bytes (UTF-8 text)</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">array</td>
<td align="left">N data items (elements)</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">map</td>
<td align="left">2N data items (key/value pairs)</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">tag of number N</td>
<td align="left">1 data item</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">simple/float</td>
<td align="left">-</td>
</tr>
</tbody>
</table>
</section>
<section anchor="indefinite" toc="default">
<name>Indefinite Lengths for Some Major Types</name>
<t>Four CBOR items (arrays, maps, byte strings, and text strings) can be
encoded with an indefinite length using additional information
value 31. This is useful if the encoding of the item needs to begin
before the number of items inside the array or map, or the total
length of the string, is known. (The ability to start sending a data
item before all of it is known is often
referred to as "streaming" within that data item.)</t>
<t>Indefinite-length arrays and maps are dealt with differently than
indefinite-length strings (byte strings and text strings).</t>
<section anchor="break" toc="default">
<name>The "break" Stop Code</name>
<t>The "break" stop code is encoded with major type 7 and additional
information value 31 (0b111_11111). It is not itself a data item: it
is just a syntactic feature to close an indefinite-length item.</t>
<t>If the "break" stop code appears where a data item is expected,
other than directly inside an indefinite-length string, array, or
map -- for example, directly inside a definite-length array or map
-- the enclosing item is not well-formed.</t>
</section>
<section anchor="indef" toc="default">
<name>Indefinite-Length Arrays and Maps</name>
<t>Indefinite-length arrays and maps are represented using their major
type with the additional information value of 31, followed by an
arbitrary-length sequence of zero or more items for an array or key/value pairs for
a map, followed by the "break" stop code (<xref target="break" format="default"/>). In other words, indefinite-length
arrays and maps look identical to other arrays and maps except for
beginning with the additional information value of 31 and ending with the
"break" stop code.</t>
<t>If the "break" stop code appears after a key in a map, in place of that
key's value, the map is not well-formed.</t>
<t>There is no restriction against nesting indefinite-length
array or map items. A "break" only terminates a single item, so
nested indefinite-length items need exactly as many "break" stop codes
as there are type bytes starting an indefinite-length item.</t>
<t>For example, assume an encoder wants to represent the abstract array
[1, [2, 3], [4, 5]]. The definite-length encoding would be
0x8301820203820405:</t>
<artwork type="hex-dump"><![CDATA[
83 -- Array of length 3
01 -- 1
82 -- Array of length 2
02 -- 2
03 -- 3
82 -- Array of length 2
04 -- 4
05 -- 5
]]></artwork>
<t>Indefinite-length encoding could be applied independently to each of
the three arrays encoded in this data item, as required, leading to
representations such as:</t>
<artwork type="hex-dump"><![CDATA[
0x9f018202039f0405ffff
9F -- Start indefinite-length array
01 -- 1
82 -- Array of length 2
02 -- 2
03 -- 3
9F -- Start indefinite-length array
04 -- 4
05 -- 5
FF -- "break" (inner array)
FF -- "break" (outer array)
]]></artwork>
<artwork type="hex-dump"><![CDATA[
0x9f01820203820405ff
9F -- Start indefinite-length array
01 -- 1
82 -- Array of length 2
02 -- 2
03 -- 3
82 -- Array of length 2
04 -- 4
05 -- 5
FF -- "break"
]]></artwork>
<artwork type="hex-dump"><![CDATA[
0x83018202039f0405ff
83 -- Array of length 3
01 -- 1
82 -- Array of length 2
02 -- 2
03 -- 3
9F -- Start indefinite-length array
04 -- 4
05 -- 5
FF -- "break"
]]></artwork>
<artwork type="hex-dump"><![CDATA[
0x83019f0203ff820405
83 -- Array of length 3
01 -- 1
9F -- Start indefinite-length array
02 -- 2
03 -- 3
FF -- "break"
82 -- Array of length 2
04 -- 4
05 -- 5
]]></artwork>
<t>An example of an indefinite-length map (that happens to have two
key/value pairs) might be:</t>
<artwork type="hex-dump"><![CDATA[
0xbf6346756ef563416d7421ff
BF -- Start indefinite-length map
63 -- First key, UTF-8 string length 3
46756e -- "Fun"
F5 -- First value, true
63 -- Second key, UTF-8 string length 3
416d74 -- "Amt"
21 -- Second value, -2
FF -- "break"
]]></artwork>
</section>
<section anchor="indefinite-length-byte-strings-and-text-strings" toc="default">
<name>Indefinite-Length Byte Strings and Text Strings</name>
<t>Indefinite-length strings are represented by a byte containing the
major type for byte string or text string with an additional
information value of 31, followed by a series of zero or more strings
of the specified type ("chunks") that have definite lengths, and
finished by the "break" stop code (<xref target="break" format="default"/>). The data item
represented by the indefinite-length string is the concatenation of
the chunks. If no chunks are present, the data item is an empty
string of the specified type. Zero-length chunks, while not
particularly useful, are permitted.</t>
<t>If any item between the indefinite-length string indicator
(0b010_11111 or 0b011_11111) and the "break" stop code is not a definite-length
string item of the same major type, the string is not well-formed.</t>
<t>The design does not allow nesting
indefinite-length strings as chunks into indefinite-length strings.
If it were allowed, it would require decoder implementations to keep a stack, or at
least a count, of nesting levels. It is unnecessary on the
encoder side because the inner indefinite-length string would consist of
chunks, and these could instead be put directly into the outer indefinite-length
string.</t>
<t>If any definite-length text string inside an indefinite-length text
string is invalid, the indefinite-length text string is invalid. Note
that this implies that the UTF-8 bytes of a single Unicode code point
(scalar value) cannot be spread between chunks: a new chunk of a text
string can only be started at a code point boundary.</t>
<t>For example, assume an encoded data item consisting of the bytes:</t>
<artwork type="hex-dump"><![CDATA[
0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111
5F -- Start indefinite-length byte string
44 -- Byte string of length 4
aabbccdd -- Bytes content
43 -- Byte string of length 3
eeff99 -- Bytes content
FF -- "break"
]]></artwork>
<t>After decoding, this results in a single byte string with seven bytes:
0xaabbccddeeff99.</t>
</section>
<section anchor="summary-of-indefinite-length-use-of-major-types" toc="default">
<name>Summary of Indefinite-Length Use of Major Types</name>
<t><xref target="major-type-indef-table" format="default"/> summarizes the major types defined by CBOR as
used for indefinite-length encoding (with additional information set
to 31).</t>
<table anchor="major-type-indef-table" align="center">
<name>Overview of the Indefinite-Length Use of CBOR Major Types (Additional Information = 31)</name>
<thead>
<tr>
<th align="left">Major Type</th>
<th align="left">Meaning</th>
<th align="left">Enclosed up to "break" Stop Code</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">0</td>
<td align="left">(not well-formed)</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">1</td>
<td align="left">(not well-formed)</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">byte string</td>
<td align="left">definite-length byte strings</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">text string</td>
<td align="left">definite-length text strings</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">array</td>
<td align="left">data items (elements)</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">map</td>
<td align="left">data items (key/value pairs)</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">(not well-formed)</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">"break" stop code</td>
<td align="left">-</td>
</tr>
</tbody>
</table>
</section>
</section>
<section anchor="fpnocont" toc="default">
<name>Floating-Point Numbers and Values with No Content</name>
<t>Major type 7 is for two types of data: floating-point numbers and
"simple values" that do not need any content. Each value of the 5-bit
additional information in the initial byte has its own separate
meaning, as defined in <xref target="fpnoconttbl" format="default"/>. Like the major types for
integers, items of this major type do not carry content data; all the
information is in the initial bytes (the head).</t>
<table anchor="fpnoconttbl" align="center">
<name>Values for Additional Information in Major Type 7</name>
<thead>
<tr>
<th align="left">5-Bit Value</th>
<th align="left">Semantics</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">0..23</td>
<td align="left">Simple value (value 0..23)</td>
</tr>
<tr>
<td align="left">24</td>
<td align="left">Simple value (value 32..255 in following byte)</td>
</tr>
<tr>
<td align="left">25</td>
<td align="left">IEEE 754 Half-Precision Float (16 bits follow)</td>
</tr>
<tr>
<td align="left">26</td>
<td align="left">IEEE 754 Single-Precision Float (32 bits follow)</td>
</tr>
<tr>
<td align="left">27</td>
<td align="left">IEEE 754 Double-Precision Float (64 bits follow)</td>
</tr>
<tr>
<td align="left">28-30</td>
<td align="left">Reserved, not well-formed in the present document</td>
</tr>
<tr>
<td align="left">31</td>
<td align="left">"break" stop code for indefinite-length items (<xref target="break" format="default"/>)</td>
</tr>
</tbody>
</table>
<t>As with all other major types, the 5-bit value 24 signifies a
single-byte extension: it is followed by an additional byte to
represent the simple value. (To minimize confusion, only the values 32
to 255 are used.) This maintains the structure of the initial bytes:
as for the other major types, the length of these always depends on
the additional information in the first byte. <xref target="fpnoconttbl2" format="default"/> lists
the numeric values assigned and available for simple values.</t>
<table anchor="fpnoconttbl2" align="center">
<name>Simple Values</name>
<thead>
<tr>
<th align="left">Value</th>
<th align="left">Semantics</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">0..19</td>
<td align="left">(unassigned)</td>
</tr>
<tr>
<td align="left">20</td>
<td align="left">false</td>
</tr>
<tr>
<td align="left">21</td>
<td align="left">true</td>
</tr>
<tr>
<td align="left">22</td>
<td align="left">null</td>
</tr>
<tr>
<td align="left">23</td>
<td align="left">undefined</td>
</tr>
<tr>
<td align="left">24..31</td>
<td align="left">(reserved)</td>
</tr>
<tr>
<td align="left">32..255</td>
<td align="left">(unassigned)</td>
</tr>
</tbody>
</table>
<t>An encoder <bcp14>MUST NOT</bcp14> issue two-byte sequences that
start with 0xf8 (major type 7, additional information 24) and continue
with a byte less than 0x20 (32 decimal). Such sequences are not
well-formed. (This implies that an encoder cannot encode <tt>false</tt>, <tt>true</tt>,
<tt>null</tt>, or <tt>undefined</tt> in two-byte sequences and that only the one-byte
variants of these are well-formed; more generally speaking, each
simple value only has a single representation variant).</t>
<t>The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
IEEE 754 binary floating-point values <xref target="IEEE754" format="default"/>. These floating-point values
are encoded in the additional bytes of the appropriate size. (See
<xref target="half-precision" format="default"/> for some information about 16-bit floating-point numbers.)</t>
</section>
<section anchor="tags" toc="default">
<name>Tagging of Items</name>
<t>In CBOR, a data item can be enclosed by a tag to give it some
additional semantics, as uniquely identified by a <em>tag number</em>.
The tag is major type 6, its argument (<xref target="encoding" format="default"/>) indicates the tag
number, and it contains a single enclosed data item, the
<em>tag content</em>.
(If a tag requires further structure to its content, this structure is
provided by the enclosed data item.)
We use the term <em>tag</em> for the entire data item consisting of both a
tag number and the tag content: the tag content is the data item that
is being tagged.</t>
<t>For example, assume that a byte string of length 12 is marked with a
tag of number 2 to indicate it is an unsigned <em>bignum</em> (<xref target="bignums" format="default"/>).
The encoded data item would start with a byte 0b110_00010 (major type
6, additional information 2 for the tag number) followed by the
encoded tag content: 0b010_01100 (major type 2, additional information
12 for the length) followed by the 12 bytes of the bignum.</t>
<t>In the extended generic data model, a tag number's
definition describes the additional semantics
conveyed with the tag number.
These semantics may include equivalence of some tagged data
items with other data items, including some that can be
represented in the basic generic data model. For instance, 0xc24101,
a bignum the tag content of which is the byte string with the single
byte 0x01, is equivalent to an integer 1, which could also be encoded
as 0x01, 0x1801, or 0x190001.
The tag definition may specify a preferred
serialization (<xref target="preferred" format="default"/>) that is recommended for generic
encoders; this may prefer basic generic data model representations
over ones that employ a tag.</t>
<t>The tag definition usually defines which nested data items are
valid for such tags. Tag definitions may restrict their content to a
very specific syntactic structure, as the tags defined in this
document do, or they may define their content more semantically. An
example for the latter is how tags 40 and 1040 accept multiple ways to
represent arrays <xref target="RFC8746" format="default"/>.
</t>
<t>As a matter of convention, many tags do not accept <tt>null</tt> or <tt>undefined</tt>
values as tag content; instead, the expectation is that a <tt>null</tt> or
<tt>undefined</tt> value can be used in place of the entire tag; <xref target="epochdatetimesect" format="default"/>
provides some further considerations for one specific tag about the
handling of this convention in application protocols and in mapping
to platform types.</t>
<t>Decoders do not need to understand tags of every tag number, and tags may be of
little value in applications where the implementation creating a
particular CBOR data item and the implementation decoding that stream
know the semantic meaning of each item in the data flow. The primary
purpose of tags in this specification is to define common data types such as
dates. A secondary purpose is to provide conversion hints when it is
foreseen that the CBOR data item needs to be translated into a
different format, requiring hints about the content of items.
Understanding the semantics of tags is
optional for a decoder; it can simply present both the tag number and
the tag content to the application, without interpreting the additional
semantics of the tag.</t>
<t>A tag applies semantics to the data item it encloses.
Tags can nest: if tag A encloses tag B, which encloses data item C,
tag A applies to the result of applying tag B on data item C.</t>
<t>IANA maintains a registry of tag numbers as described in <xref target="ianatags" format="default"/>.
<xref target="tagvalues" format="default"/> provides a list of tag numbers
that were defined in <xref target="RFC7049" format="default"/> with definitions in
the rest of this section.
(Tag number 35 was also defined in <xref target="RFC7049" format="default"/>; a discussion of this
tag number follows in <xref target="encodedtext" format="default"/>.)
Note that many other tag numbers have been defined since the publication of <xref target="RFC7049" format="default"/>;
see the registry described at <xref target="ianatags" format="default"/> for the complete list.</t>
<table anchor="tagvalues" align="center">
<name>Tag Numbers Defined in RFC 7049</name>
<thead>
<tr>
<th align="left">Tag</th>
<th align="left">Data Item</th>
<th align="left">Semantics</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">0</td>
<td align="left">text string</td>
<td align="left">Standard date/time string; see <xref target="stringdatetimesect" format="default"/></td>
</tr>
<tr>
<td align="left">1</td>
<td align="left">integer or float</td>
<td align="left">Epoch-based date/time; see <xref target="epochdatetimesect" format="default"/></td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">byte string</td>
<td align="left">Unsigned bignum; see <xref target="bignums" format="default"/></td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">byte string</td>
<td align="left">Negative bignum; see <xref target="bignums" format="default"/></td>
</tr>
<tr>
<td align="left">4</td>