-
Notifications
You must be signed in to change notification settings - Fork 5
/
draft-faltstrom-base45-11.txt
392 lines (248 loc) · 14.7 KB
/
draft-faltstrom-base45-11.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
Network Working Group P. Faltstrom
Internet-Draft Netnod
Intended status: Informational F. Ljunggren
Expires: 17 December 2022 Kirei
D. van Gulik
Webweaving
15 June 2022
The Base45 Data Encoding
draft-faltstrom-base45-11
Abstract
This document describes the Base45 encoding scheme which is built
upon the Base64, Base32 and Base16 encoding schemes.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 17 December 2022.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Faltstrom, et al. Expires 17 December 2022 [Page 1]
Internet-Draft Base45 June 2022
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Conventions Used in This Document . . . . . . . . . . . . . . 2
3. Interpretation of Encoded Data . . . . . . . . . . . . . . . 2
4. The Base45 Encoding . . . . . . . . . . . . . . . . . . . . . 3
4.1. When to, and not to, use Base45 . . . . . . . . . . . . . 4
4.2. The alphabet used in Base45 . . . . . . . . . . . . . . . 4
4.3. Encoding examples . . . . . . . . . . . . . . . . . . . . 4
4.4. Decoding examples . . . . . . . . . . . . . . . . . . . . 5
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
6. Security Considerations . . . . . . . . . . . . . . . . . . . 5
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6
8. Normative References . . . . . . . . . . . . . . . . . . . . 6
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7
1. Introduction
A QR-code is used to encode text as a graphical image. Depending on
the characters used in the text various encoding options for a QR-
code exist, e.g. Numeric, Alphanumeric and Byte mode. Even in Byte
mode a typical QR-code reader tries to interpret a byte sequence as a
UTF-8 or ISO/IEC 8859-1 encoded text. Thus, QR-codes cannot be used
to encode arbitrary binary data directly. Such data has to be
converted into an appropriate text before that text could be encoded
as a QR-code. Compared to already established Base64, Base32 and
Base16 encoding schemes, that are described in RFC 4648 [RFC4648],
the Base45 scheme described in this document offer a more compact QR-
code encoding.
One important difference from those and Base45 is the key table and
that the padding with '=' is not required.
2. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
3. Interpretation of Encoded Data
Encoded data is to be interpreted as described in RFC 4648 [RFC4648]
with the exception that a different alphabet is selected.
Faltstrom, et al. Expires 17 December 2022 [Page 2]
Internet-Draft Base45 June 2022
4. The Base45 Encoding
QR codes have a limited ability to store binary data. In practice
binary data have to be encoded in characters according to one of the
modes already defined in the standard for QR codes. The easiest mode
to use in called Alphanumeric mode (see section 7.3.4 and Table 2 of
ISO/IEC 18004:2015 [ISO18004]). Unfortunately Alphanumeric mode uses
45 different characters which implies neither Base32 nor Base64 are
very effective encodings.
A 45-character subset of US-ASCII is used; the 45 characters usable
in a QR code in Alphanumeric mode (see section 7.3.4 and Table 2 of
ISO/IEC 18004:2015 [ISO18004]). Base45 encodes 2 bytes in 3
characters, compared to Base64, which encodes 3 bytes in 4
characters.
For encoding, two bytes [a, b] MUST be interpreted as a number n in
base 256, i.e. as an unsigned integer over 16 bits so that the number
n = (a*256) + b.
This number n is converted to base 45 [c, d, e] so that n = c +
(d*45) + (e*45*45). Note the order of c, d and e which are chosen so
that the left-most [c] is the least significant.
The values c, d and e are then looked up in Table 1 to produce a
three character string. The process is reversed when decoding.
For encoding a single byte [a], it MUST be interpreted as a base 256
number, i.e. as an unsigned integer over 8 bits. That integer MUST
be converted to base 45 [c d] so that a = c + (45*d). The values c
and d are then looked up in Table 1 to produce a two character
string.
A byte string [a b c d ... x y z] with arbitrary content and
arbitrary length MUST be encoded as follows: From left to right pairs
of bytes MUST be encoded as described above. If the number of bytes
is even, then the encoded form is a string with a length which is
evenly divisible by 3. If the number of bytes is odd, then the last
(rightmost) byte MUST be encoded on two characters as described
above.
For decoding a Base45 encoded string the inverse operations are
performed.
Faltstrom, et al. Expires 17 December 2022 [Page 3]
Internet-Draft Base45 June 2022
4.1. When to, and not to, use Base45
If binary data is to be stored in a QR-Code, the suggested mechanism
is to use the Alphanumeric mode that uses 11 bits for 2 characters as
defined in section 7.3.4 in ISO/IEC 18004:2015 [ISO18004]. The ECI
mode indicator for this encoding is 0010.
On the other hand if the data is to be sent via some other transport,
a transport encoding suitable for that transport should be used
instead of Base45. For example, it is not recommended to first
encode data in Base45 and then encode the resulting string in Base64
if the data is to be sent via email. Instead, the Base45 encoding
should be removed, and the data itself should be encoded in Base64.
4.2. The alphabet used in Base45
The Alphanumeric mode is defined to use 45 characters as specified in
this alphabet.
Table 1: The Base45 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
00 0 12 C 24 O 36 Space
01 1 13 D 25 P 37 $
02 2 14 E 26 Q 38 %
03 3 15 F 27 R 39 *
04 4 16 G 28 S 40 +
05 5 17 H 29 T 41 -
06 6 18 I 30 U 42 .
07 7 19 J 31 V 43 /
08 8 20 K 32 W 44 :
09 9 21 L 33 X
10 A 22 M 34 Y
11 B 23 N 35 Z
4.3. Encoding examples
It should be noted that although the examples are all text, Base45 is
an encoding for binary data where each octet can have any value
0-255.
Encoding example 1: The string "AB" is the byte sequence [65 66].
The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals 11 + 45 * 11
+ 45 * 45 * 8, so the sequence in base 45 is [11 11 8]. By looking
up these values in the Table 1 we get the encoded string "BB8".
Faltstrom, et al. Expires 17 December 2022 [Page 4]
Internet-Draft Base45 June 2022
Encoding example 2: The string "Hello!!" as ASCII is the byte
sequence [72 101 108 108 111 33 33]. If we look at each 16 bit
value, it is [18533 27756 28449 33]. Note the 33 for the last byte.
When looking at the values modulo 45, we get [[38 6 9] [36 31 13] [9
2 14] [33 0]] where the last byte is represented by two. The
resulting string "%69 VD92EX0" is created by looking up these values
in Table 1. It should be noted it includes a space.
Encoding example 3: The string "base-45" as ASCII is the byte
sequence [98 97 115 101 45 52 53]. If we look at each 16 bit value,
it is [25185 29541 11572 53]. Note the 53 for the last byte. When
looking at the values modulo 45, we get [[30 19 12] [21 26 14] [7 32
5] [8 1]] where the last byte is represented by two. By looking up
these values in the Table 1 we get the encoded string "UJCLQE7W581".
4.4. Decoding examples
Decoding example 1: The string "QED8WEX0" represents, when looked up
in Table 1, the values [26 14 13 8 32 14 33 0]. We arrange the
numbers in chunks of three, except for the last one which can be two,
and get [[26 14 13] [8 32 14] [33 0]]. In base 45 we get [26981
29798 33] where the bytes are [[105 101] [116 102] [33]]. If we look
at the ASCII values we get the string "ietf!".
5. IANA Considerations
There are no considerations for IANA in this document.
6. Security Considerations
When implementing encoding and decoding it is important to be very
careful so that buffer overflow or similar does not occur. This of
course includes the calculations for modulo 45 and lookup in the
table of characters (Table 1). A decoder must also be robust
regarding input, including proper handling of any octet value 0-255,
including the NUL character (ASCII 0).
It should be noted that Base64 and some other encodings pad the
string so that the encoding starts with an aligned number of
characters while Base45 specifically avoids padding. Because of
this, special care has to be taken when odd number of octets are to
be encoded. Similarly, care must be taken if the number of
characters to decode are not evenly divisible by 3.
Base encodings use a specific, reduced alphabet to encode binary
data. Non-alphabet characters could exist within base-encoded data,
caused by data corruption or by design. Non-alphabet characters may
be exploited as a "covert channel", where non-protocol data can be
Faltstrom, et al. Expires 17 December 2022 [Page 5]
Internet-Draft Base45 June 2022
sent for nefarious purposes. Non-alphabet characters might also be
sent in order to exploit implementation errors leading to, e.g.,
buffer overflow attacks.
Implementations MUST reject any input that is not a valid encoding.
For example, it MUST reject the input (encoded data) if it contains
characters outside the base alphabet (in Table 1) when interpreting
base-encoded data.
Even though a Base45 encoded string contains only characters from the
alphabet in Table 1, cases like the following has to be considered:
The string "FGW" represents 65535 (FFFF in base 16), which is a valid
encoding of 16 bits. A slightly different encoded string of the same
length, "GGW", would represent 65536 (10000 in base 16), which is
more than 16 bits. Implementations MUST also reject the encoded data
if it contains a triplet of characters which, when decoded, results
in an unsigned integer which is greater than 65535 (ffff in base 16).
It should be noted that the resulting string after encoding to Base45
might include non-URL-safe characters so if the URL including the
Base45 encoded data has to be URL safe, one has to use %-encoding.
7. Acknowledgements
The authors thank Mark Adler, Anders Ahl, Alan Barrett, Sam Spens
Clason, Alfred Fiedler, Tomas Harreveld, Erik Hellman, Joakim
Jardenberg, Michael Joost, Erik Kline, Christian Landgren, Anders
Lowinger, Mans Nilsson, Jakob Schlyter, Peter Teufl and Gaby
Whitehead for the feedback. Also, everyone that have been working
with Base64 over a long period of years and have proven the
implementations are stable.
8. Normative References
[ISO18004] ISO/IEC JTC 1/SC 31, "ISO/IEC 18004:2015 Information
technology - Automatic identification and data capture
techniques - QR Code bar code symbology specification",
ISO/IEC
18004:2015 https://www.iso.org/standard/62021.html,
February 2015.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
Faltstrom, et al. Expires 17 December 2022 [Page 6]
Internet-Draft Base45 June 2022
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
<https://www.rfc-editor.org/info/rfc4648>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
Authors' Addresses
Patrik Faltstrom
Netnod
Email: paf@netnod.se
Fredrik Ljunggren
Kirei
Email: fredrik@kirei.se
Dirk-Willem van Gulik
Webweaving
Email: dirkx@webweaving.org
Faltstrom, et al. Expires 17 December 2022 [Page 7]