-
Notifications
You must be signed in to change notification settings - Fork 5
/
draft-faltstrom-base45-06.xml
307 lines (305 loc) · 11.6 KB
/
draft-faltstrom-base45-06.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE RFC SYSTEM "rfc2629.dtd" [
<!ENTITY RFC4648 SYSTEM
"http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4648.xml">
<!ENTITY RFC2119 SYSTEM
"http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
]>
<?xml-stylesheet type='text/xsl' href='RFC2629.xslt' ?>
<?rfc compact="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<!-- Expand crefs and put them inline -->
<?rfc comments='yes' ?>
<?rfc inline='yes' ?>
<rfc docName="draft-faltstrom-base45-06" ipr="trust200902" category="std">
<front>
<title abbrev="Base45">
The Base45 Data Encoding
</title>
<author fullname="Patrik Faltstrom" initials="P." surname="Faltstrom">
<organization abbrev="Netnod">Netnod</organization>
<address>
<email>paf@netnod.se</email>
</address>
</author>
<author fullname="Fredrik Ljunggren" initials="F." surname="Ljunggren">
<organization abbrev="Kirei">Kirei</organization>
<address>
<email>fredrik@kirei.se</email>
</address>
</author>
<author fullname="Dirk-Willem van Gulik" initials="D." surname="van Gulik">
<organization abbrev="Webweaving">Webweaving</organization>
<address>
<email>dirkx@webweaving.org</email>
</address>
</author>
<date month="June" year="2021" day="14"/>
<area>Operations</area>
<keyword>BASE45</keyword>
<abstract>
<t>
This document describes the Base45 encoding scheme which is
built upon the Base64, Base32 and Base16 encoding schemes.
</t>
</abstract>
</front>
<middle>
<section anchor="intro" title="Introduction">
<t>
A QR-code is used to encode text as a graphical
image. Depending on the characters used in the text various
encoding options for a QR-code exists, e.g. Numeric,
Alphanumeric and Byte mode. Even in Byte mode a typical
QR-code reader tries to interpret a byte sequence as an UTF-8
or ISO/IEC 8859-1 encoded text. Thus QR-codes cannot be used
to encode arbitrary binary data directly. Such data has to be
converted into an appropriate text before that text could be
encoded as a QR-code. Compared to already established Base64,
Base32 and Base16 encoding schemes, that are described in
<xref target="RFC4648">RFC 4648</xref>, the Base45 scheme
described in this document offer a more compact QR-code
encoding.
</t>
<t>
One important difference from those and Base45 is the key
table and that the padding with '=' is not required.
</t>
</section>
<section title="Conventions Used in This Document">
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in <xref target="RFC2119">RFC 2119</xref>.
</t>
</section>
<section title="Interpretation of Encoded Data">
<t>
Encoded data is to be interpreted as described in <xref
target="RFC4648">RFC 4648</xref> with the exception that a
different alphabet is selected.
</t>
</section>
<section title="The Base45 Encoding">
<t>
A 45-character subset of US-ASCII is used; the 45 characters
usable in a QR code in Alphanumeric mode. Base45
encodes 2 bytes in 3 characters, compared to Base64,
which encodes 3 bytes in 4 characters.
</t>
<t>
For encoding two bytes [a, b] MUST be interpreted as a number
n in base 256, i.e. as an unsigned integer over 16 bits so
that the number n = (a*256) + b.
</t>
<t>
This number n is converted to base 45 [c, d, e] so that n = c
+ (d*45) + (e*45*45). Note the order of c, d an e which are
chosen so that the left-most [c] is the least significant.
</t>
<t>
The values c, d and e are then looked up in Table 1 to produce
a three character string. The process is reversed when
decoding.
</t>
<t>
For encoding a single byte [a], it MUST be interpreted as a
base 256 number, i.e. as an unsigned integer over 8 bit. That
integer MUST be converted to base 45 [c d] so that a = c
+ (45*d). The values c and d are then looked up in Table 1 to
produce a two character string.
</t>
<t>
A byte string [a b c d ... x y z] with arbitrary content and
arbitrary length MUST be encoded as follows: From left to
right pairs of bytes are encoded as described above. If the
number of bytes is even, then the encoded form is a string
with a length which is evenly divisible by 3. If the number of
bytes is odd, then the last (rightmost) byte is encoded on two
characters as described above.
</t>
<t>
For decoding a Base45 encoded string the inverse operations
are performed.
</t>
<section title="When to use Base45">
<t>
If binary data is to be stored in a QR-Code one possible way
is to use the Alphanumeric mode that uses 11 bits for 2
characters as defined in section 7.3.4 in <xref
target="ISO18004">ISO/IEC 18004:2015</xref>. The
ECI mode indicator for this encoding is 0010.
</t>
<t>
If the data is to be sent via some other transport, a
transport encoding suitable for that transport should be
used instead of Base45. It is not recommended to first
encode data in Base45 and then encode the resulting string
in for example Base64 if the data is to be sent via
email. Instead the Base45 encoding should be removed, and
the data itself should be encoded in Base64.
</t>
</section>
<section title="The alphabet used in Base45">
<t>
The Alphanumeric mode is defined to use 45 characters as specified
in this alphabet.
</t>
<t>
<figure><artwork>
Table 1: The Base45 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
00 0 12 C 24 O 36 Space
01 1 13 D 25 P 37 $
02 2 14 E 26 Q 38 %
03 3 15 F 27 R 39 *
04 4 16 G 28 S 40 +
05 5 17 H 29 T 41 -
06 6 18 I 30 U 42 .
07 7 19 J 31 V 43 /
08 8 20 K 32 W 44 :
09 9 21 L 33 X
10 A 22 M 34 Y
11 B 23 N 35 Z
</artwork></figure>
</t>
</section>
<section title="Encoding examples">
<t>
It should be noted that although the examples are all text,
Base45 is an encoding for binary data where each octet can
have any value 0-255.
</t>
<t>
Encoding example 1: The string "AB" is the byte sequence [65
66]. The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals
11 + 45 * 11 + 45 * 45 * 8 so the sequence in base 45 is [11
11 8]. By looking up these values in the Table 1 we get the
encoded string "BB8".
</t>
<t>
Encoding example 2: The string "Hello!!" as ASCII is the
byte sequence [72 101 108 108 111 33 33]. If we look at each
16 bit value, it is [18533 27756 28449 33]. Note the 33 for
the last byte. When looking at the values modulo 45, we get
[[38 6 9] [36 31 13] [9 2 14] [33 0]] where the last byte is
represented by two. By looking up these values in the Table
1 we get the encoded string "%69 VD92EX0".
</t>
<t>
Encoding example 3: The string "base-45" as ASCII is the
byte sequence [98 97 115 101 45 52 53]. If we look at each
16 bit value, it is [25185 29541 11572 53]. Note the 53 for
the last byte. When looking at the values modulo 45, we get
[[30 19 12] [21 26 14] [7 32 5] [8 1]] where the last byte
is represented by two. By looking up these values in the
Table 1 we get the encoded string "UJCLQE7W581".
</t>
</section>
<section title="Decoding examples">
<t>
Decoding example 1: The string "QED8WEX0" represents, when
looked up in Table 1, the values [26 14 13 8 32 14 33 0]. We
arrange at the numbers in chunks of three, except last which
can be two, and get [[26 14 13] [8 32 14] [33 0]]. In base
45 we get [26981 29798 33] where the bytes are [[105 101]
[116 102] [33]]. If we look at the ascii values we get the
string "ietf!".
</t>
</section>
</section>
<section title="IANA Considerations">
<t>
There are no considerations for IANA in this document.
</t>
</section>
<section title="Security Considerations">
<t>
When implementing encoding and decoding it is important to be
very careful so that buffer overflow or similar does not
occur. This of course includes the calculations for modulo 45
and lookup in the table of characters (Table 1). A decoder
also must be robust regarding input, including proper handling
of any octet value 0-255, including the NUL character (ASCII
0).
</t>
<t>
It should be noted that Base64 and some other encodings pad
the string so that the encoding starts with an aligned number
of characters, Base45 specifically avoids padding. Because of
this, special care has to be taken when odd number of octets
are to be encoded, which results not in N*3 characters, but
(N-1)*3+2 characters in the encoded string and similarly, at
decoding, when the number of encoded characters are not evenly
divisible by 3.
</t>
<t>
Base encodings use a specific, reduced alphabet to encode
binary data. Non-alphabet characters could exist within
base-encoded data, caused by data corruption or by
design. Non-alphabet characters may be exploited as a "covert
channel", where non-protocol data can be sent for nefarious
purposes. Non-alphabet characters might also be sent in order
to exploit implementation errors leading to, e.g., buffer
overflow attacks.
</t>
<t>
Implementations MUST reject the encoded data if it contains
characters outside the base alphabet (in Table 1) when
interpreting base-encoded data.
</t>
<t>
Even though a Base45 encoded string contains only characters
from the alphabet in Table 1 the following case has to be
considered: The string "FGW" represents 65535 (FFFF in base 16),
which is a valid encoding. The string "GGW" would represent
65536 (10000 in base 16), which is represented by more than 16 bit.
</t>
<t>
Implementations MUST reject the encoded data if it contains a
triplet of characters which, when decoded, result in an
unsigned integer which is greater then 65535 (ffff in base 16).
</t>
<t>
It should be noted that the resulting string after encoding to
Base45 might include non-URL-safe characters so if the URL
including the Base45 encoded data have to be URL safe, one
have to use %-encoding.
</t>
</section>
<section title="Acknowledgements">
<t>
The authors thank Alan Barrett, Alfred Fiedler, Tomas
Harreveld, Joakim Jardenberg, Christian Landgren, Anders
Lowinger, Mans Nilsson, Jakob Schlyter, Peter Teufl and Gaby
Whitehead for the feedback. Also everyone that have been
working with Base64 over a long period of years and have
proven the implementions are stable.
</t>
</section>
</middle>
<back>
<references title='Normative References'>
&RFC4648;
&RFC2119;
<reference anchor="ISO18004">
<front>
<title>
ISO/IEC 18004:2015 Information technology - Automatic
identification and data capture techniques - QR Code bar
code symbology specification
</title>
<author>
<organization>ISO/IEC JTC 1/SC 31</organization>
</author>
<date month="February" year="2015" />
</front>
<seriesInfo name="ISO/IEC 18004:2015"
value="https://www.iso.org/standard/62021.html" />
</reference>
</references>
</back>
</rfc>