-
Notifications
You must be signed in to change notification settings - Fork 4
/
sect03.html
668 lines (566 loc) · 29.5 KB
/
sect03.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
<!DOCTYPE html>
<html>
<head>
<title>The Z-Machine Standards Document</title>
<link rel="stylesheet" type="text/css" href="zspec.css">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
</head>
<body>
<img class="icon" src="images/icon03.gif" alt="">
<h1>3. How text and characters are encoded</h1>
<blockquote>
<p>This technique is similar to the five-bit Baudot code, which
was used by early Teletypes before ASCII was invented.
</p>
<p>Marc S. Blank and S. W. Galley, <strong>How to Fit a Large Program Into
a Small Machine</strong>
</p>
</blockquote>
<hr>
<p>3.1 <a href="#3.1">Text</a> /
3.2 <a href="#3.2">Alphabets</a> /
3.3 <a href="#3.3">Abbreviations</a> /
3.4 <a href="#3.4">ZSCII escape</a> /
3.5 <a href="#3.5">Alphabet table</a> /
3.6 <a href="#3.6">Padding and incompleteness</a> /
3.7 <a href="#3.7">Dictionary truncation</a> /
3.8 <a href="#3.8">Definition of ZSCII and Unicode</a>
</p>
<hr>
<h2 id="3.1">3.1</h2>
<p>Z-machine text is a sequence of ZSCII character codes (ZSCII is
a system similar to ASCII: see <a href="sect03.html#3.8"><strong>§</strong> 3.8</a> below). These ZSCII values are
encoded into memory using a string of Z-characters. The process of
converting between Z-characters and ZSCII values is given in <a href="sect03.html#3.2"><strong>§</strong> 3.2 to
3.7</a> below.
</p>
<h2 id="3.2">3.2</h2>
<p>Text in memory consists of a sequence of 2-byte words. Each word
is divided into three 5-bit 'Z-characters', plus 1 bit left over, arranged
as
</p>
<pre>
--first byte------- --second byte---
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
bit --first-- --second--- --third--
</pre>
<p>The bit is set only on the last 2-byte word of the text, and so marks the
end.
</p>
<h2 id="3.2.1">3.2.1</h2>
<p>There are three 'alphabets', A0 (lower case), A1 (upper case) and
A2 (punctuation) and during printing one of these is current at any given
time. Initially A0 is current. The meaning of a Z-character may depend on
which alphabet is current.
</p>
<h2 id="3.2.2">3.2.2</h2>
<p>In Versions 1 and 2, the current alphabet can be any of the
three. The Z-characters 2 and 3 are called 'shift' characters and change
the alphabet for the next character only. The new alphabet depends on
what the current one is:
</p>
<pre>
from A0 from A1 from A2
Z-char 2 A1 A2 A0
Z-char 3 A2 A0 A1
</pre>
<p>Z-characters 4 and 5 permanently change alphabet, according to the
same table, and are called 'shift lock' characters.
</p>
<h2 id="3.2.3">3.2.3</h2>
<p>In Versions 3 and later, the current alphabet is always A0
unless changed for 1 character only: Z-characters 4 and 5 are shift
characters. Thus 4 means "the next character is in A1" and 5 means
"the next is in A2". There are no shift lock characters.
</p>
<h2 id="3.2.4">3.2.4</h2>
<p>An indefinite sequence of shift or shift lock characters is
legal (but prints nothing).
</p>
<h2 id="3.3">3.3</h2>
<p>In Versions 3 and later, Z-characters 1, 2 and 3 represent
abbreviations, sometimes also called 'synonyms' (for traditional reasons):
the next Z-character indicates which abbreviation string to print. If z
is the first Z-character (1, 2 or 3) and x the subsequent one, then the
interpreter must look up entry 32(z-1)+x in the abbreviations table and
print the string at that word address. In Version 2, Z-character 1 has this
effect (but 2 and 3 do not, so there are only 32 abbreviations).
</p>
<h2 id="3.3.1">3.3.1</h2>
<p>Abbreviation string-printing follows all the rules of this section
except that an abbreviation string must not itself use abbreviations and
must not end with an incomplete multi-Z-character construction (see
<a href="sect03.html#3.6.1"><strong>§</strong> 3.6.1</a> below).
</p>
<h2 id="3.4">3.4</h2>
<p>Z-character 6 from A2 is a 'ZSCII escape' character. It means that the two subsequent
Z-characters specify a ten-bit ZSCII character code: the next Z-character gives the top
5 bits and the one after the bottom 5.
</p>
<h2 id="3.5">3.5</h2>
<p>The remaining Z-characters are translated into ZSCII character
codes using the "alphabet table".
</p>
<h2 id="3.5.1">3.5.1</h2>
<p>The Z-character 0 is printed as a space (ZSCII 32).</p>
<h2 id="3.5.2">3.5.2</h2>
<p>In Version 1, Z-character 1 is printed as a new-line
(ZSCII 13).
</p>
<h2 id="3.5.3">3.5.3</h2>
<p>In Versions 2 to 4, the alphabet table for converting
Z-characters into ZSCII character codes is as follows:
</p>
<pre>
Z-char 6789abcdef0123456789abcdef
current --------------------------
A0 abcdefghijklmnopqrstuvwxyz
A1 ABCDEFGHIJKLMNOPQRSTUVWXYZ
A2 ^0123456789.,!?_#'"/\-:()
--------------------------
</pre>
<p>For example, in alphabet A1 the Z-character 12 is translated as a capital G
(ZSCII character code 71).
</p>
<p>Character 6 in A2 is printed as a space here, but it is a 'ZSCII escape' character, and not
translated using the alphabet table: see <a href="sect03.html#3.4"><strong>§</strong> 3.4</a>
above. Character 7 in A2, written here as a circumflex ^, is a new-line (ZSCII 13).
</p>
<h2 id="3.5.4">3.5.4</h2>
<p>Version 1 has a slightly different A2 row in its alphabet
table (new-line is not needed, making room for the <strong> < </strong> character):
</p>
<pre>
6789abcdef0123456789abcdef
--------------------------
A2 0123456789.,!?_#'"/\<-:()
--------------------------
</pre>
<h2 id="3.5.5">3.5.5</h2>
<p>In Versions 5 and later, the interpreter should look at
the word at <strong>$34</strong> in the <a href="sect11.html">header</a>. If this is zero, then the
alphabet table drawn out in <a href="sect03.html#3.5.3"><strong>§</strong> 3.5.3</a> continues in use. Otherwise
it is interpreted as the byte address of an alphabet table
specific to this story file.
</p>
<h2 id="3.5.5.1">3.5.5.1</h2>
<p>Such an alphabet table consists of 78 bytes arranged as 3 blocks of 26 ZSCII values,
translating Z-characters 6 to 31 for alphabets A0, A1 and A2. Z-characters 6 and 7 of
A2, however, are still translated as ZSCII escape and new-line (ZSCII 13) codes (as above).
</p>
<h2 id="3.6">3.6</h2>
<p>Since the end-bit only comes up once every three Z-characters,
a string may have to be 'padded out' with null values. This is
conventionally achieved with a sequence of 5's, though a sequence of
(for example) 4's would work equally well.
</p>
<h2 id="3.6.1">3.6.1</h2>
<p>It is legal for the string to end while a multi-Z-character
construction is incomplete: for instance, after only the top half of
an ASCII value has been given. The partial construction is simply
ignored. (This can happen in printing dictionary words which have
been guillotined to the dictionary resolution of 6 or 9 Z-characters.)
</p>
<h2 id="3.7">3.7</h2>
<p>When an interpreter is encrypting typed-in text to match against
dictionary words, the following restrictions apply. Text should be
converted to lower case (as a result A1 will not be needed unless the
game provides its own alphabet table). Abbreviations may not be used.
The pad character, if needed, must be 5. The total string length must
be 6 Z-characters (in Versions 1 to 3) or 9 (Versions 4 and later): any
multi-Z-character constructions should be left incomplete (rather than
omitted) if there's no room to finish them. For example, "i" is
encrypted as:
</p>
<p><strong>14, 5, 5, 5, 5, 5, 5, 5, 5</strong></p>
<p><strong>$48a5</strong> <strong>$14a5</strong> <strong>$94a5</strong></p>
<h2 id="3.7.1">3.7.1</h2>
<p>In Versions 1 and 2 only, when encoding text for dictionary words,
shift-lock Z-characters 4 and 5 are used instead of the single-shift
Z-characters 2 and 3 when the next two characters come from the same
alphabet.
</p>
<h2 id="3.8">3.8</h2>
<p>The character set of the Z-machine is called ZSCII (Zork
Standard Code for Information Interchange; pronounced to rhyme with
"xyzzy"). ZSCII codes are 10-bit unsigned values between 0 and
1023. Story files may only legally use the values which are defined
below. Note that some values are defined only for input and some
only for output.
</p>
<table>
<caption><strong>Table 2: summary of the ZSCII rules</strong></caption>
<tr> <td>0</td> <td>null</td> <td>Output</td> </tr>
<tr> <td>1-7</td> <td>----</td> <td></td> </tr>
<tr> <td>8</td> <td>delete</td> <td>Input</td> </tr>
<tr> <td>9</td> <td>tab (V6)</td> <td>Output</td> </tr>
<tr> <td>10</td> <td>----</td> <td></td> </tr>
<tr> <td>11</td> <td>sentence space (V6)</td> <td>Output</td> </tr>
<tr> <td>12</td> <td>----</td> <td></td> </tr>
<tr> <td>13</td> <td>newline</td> <td>Input/Output</td> </tr>
<tr> <td>14-26</td> <td>----</td> <td></td> </tr>
<tr> <td>27</td> <td>escape</td> <td>Input</td> </tr>
<tr> <td>28-31</td> <td>----</td> <td></td> </tr>
<tr> <td>32-126</td> <td>standard ASCII</td> <td>Input/Output</td> </tr>
<tr> <td>127-128</td> <td>----</td> <td></td> </tr>
<tr> <td>129-132</td> <td>cursor u/d/l/r</td> <td>Input</td> </tr>
<tr> <td>133-144</td> <td>function keys f1 to f12</td> <td>Input</td> </tr>
<tr> <td>145-154</td> <td>keypad 0 to 9</td> <td>Input</td> </tr>
<tr> <td>155-251</td> <td>extra characters</td> <td>Input/Output</td> </tr>
<tr> <td>252</td> <td>menu click (V6)</td> <td>Input</td> </tr>
<tr> <td>253</td> <td>double-click</td> <td>Input</td> </tr>
<tr> <td>254</td> <td>single-click</td> <td>Input</td> </tr>
<tr> <td>255-1023</td> <td>----</td> <td></td> </tr>
</table>
<h2 id="3.8.1">3.8.1</h2>
<p>The codes 256 to 1023 are undefined, so that for all
practical purposes ZSCII is an 8-bit unsigned code.
</p>
<h2 id="3.8.2">3.8.2</h2>
<p>The codes 0 to 31 are undefined except as follows:</p>
<h2 id="3.8.2.1">3.8.2.1</h2>
<p>ZSCII code 0 ("null") is defined for output
but has no effect in any output stream. (It is also used as
a value meaning "no character" when reporting terminating
character codes, but is not formally defined for input.)
</p>
<h2 id="3.8.2.2">3.8.2.2</h2>
<p>ZSCII code 8 ("delete") is defined for input only.</p>
<h2 id="3.8.2.3">3.8.2.3</h2>
<p>ZSCII code 9 ("tab") is defined for output in Version 6 only.
At the start of a screen line this should print a paragraph indentation
suitable for the font being used: if it is printed in the middle of a
screen line, it should be converted to a space (Infocom's own
interpreters do not do this, however).
</p>
<h2 id="3.8.2.4">3.8.2.4</h2>
<p>ZSCII code 11 ("sentence space") is defined for output in
Version 6 only. This should be printed as a suitable gap between two
sentences (in the same way that typographers normally place larger spaces
after the full stops ending sentences than after words or commas).
</p>
<h2 id="3.8.2.5">3.8.2.5</h2>
<p>ZSCII code 13 ("carriage return") is defined for input and
output.
</p>
<h2 id="3.8.2.6">3.8.2.6</h2>
<p>ZSCII code 27 ("escape" or "break") is defined for input
only.
</p>
<h2 id="3.8.3">3.8.3</h2>
<p>ZSCII codes between 32 ("space") and 126 ("tilde") are
defined for input and output, and agree with standard ASCII (as well as
all of the ISO 8859 character sets and Unicode). Specifically:
</p>
<pre>
0123456789abcdef0123456789abcdef
--------------------------------
$20 !"#$%&'()*+,-./0123456789:;<=>?
$40 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
$60 'abcdefghijklmnopqrstuvwxyz{!}~
--------------------------------
</pre>
<p>Note that code <strong>$23</strong> (35 decimal) is a hash mark, not a pound sign.
(Code <strong>$7c</strong> (124 decimal) is a vertical stroke which is shown as <strong>!</strong>
here for typesetting reasons.)
</p>
<h2 id="3.8.3.1">3.8.3.1</h2>
<p>ZSCII codes 127 ("delete" in some forms of ASCII)
and 128 are undefined.
</p>
<h2 id="3.8.4">3.8.4</h2>
<p> ZSCII codes 129 to 154 are defined for input only:</p>
<pre>
129: cursor up 130: cursor down 131: cursor left 132: cursor right
133: f1 134: f2 .... 144: f12
145: keypad 0 146: keypad 1 .... 154: keypad 9
</pre>
<h2 id="3.8.5">3.8.5</h2>
<p>The block of codes between 155 and 251 are the "extra
characters" and are used differently by different story files.
Some will need accented Latin characters (such as French E-acute),
others unusual punctuation (Spanish question mark), others new
alphabets (Cyrillic or Hebrew); still others may want dingbat
characters, mathematical or musical symbols, and so on.
</p>
<h2 id="3.8.5.1">3.8.5.1</h2>
<p><strong>***</strong>
To define which characters are required, the Unicode
(or ISO 10646-1) Basic Multilingual Plane character set is used: characters are specified
by unsigned 16-bit codes. These values agree with ISO 8859 Latin-1
in the range 0 to 255, and with ASCII and ZSCII in the range
32 to 126. The Unicode standard leaves a range of values, the
Private Use Area, free: however, an Internet group called the
ConScript Unicode Registry is organising a standard mapping of
invented scripts (such as Klingon, or Tolkien's Elvish) into the
Private Use Area, and this should be considered part of the Unicode
standard for Z-machine purposes.
</p>
<p>The Z-machine does not provide access to non-BMP characters (ie characters
outside the range U+0000 to U+FFFF).
</p>
<h2 id="3.8.5.2">3.8.5.2</h2>
<p><strong>***</strong>
The story file chooses its stock of extra characters
with a "Unicode translation table" as follows. Under Versions
1 to 4, the "default table" is always used (<a href="sect03.html#table1">see below</a>).
In Version 5 or later, if Word 3 of the header extension table is
present and non-zero then it is interpreted as the byte address
of the Unicode translation table. If Word 3 is absent or zero,
the default table is used.
</p>
<h2 id="3.8.5.2.1">3.8.5.2.1</h2>
<p>The table consists of one byte giving a number N,
followed by N two-byte words.
</p>
<h2 id="3.8.5.2.2">3.8.5.2.2</h2>
<p>This indicates that ZSCII characters 155 to 155+N-1
are defined for both input and output. (It's possible for N to
be zero, leaving the whole range 155 to 251 undefined.)
</p>
<h2 id="3.8.5.2.3">3.8.5.2.3</h2>
<p>The words in the table give Unicode character codes
for each of the ZSCII characters 155 to 155+N-1 in turn.
</p>
<h2 id="3.8.5.3">3.8.5.3</h2>
<p>The default table is as shown in <a href="sect03.html#table1">Table 1</a>.</p>
<h2 id="3.8.5.4">3.8.5.4</h2>
<p>The defined extra characters are entirely normal
ZSCII characters. They can appear in a story file's alphabet
table, in an array created by print stream 3 and so on.
</p>
<h2 id="3.8.5.4.1">3.8.5.4.1</h2>
<p><strong>***</strong>
The interpreter is required to be able to print
representations of every defined Unicode character under <strong>$0100</strong>
(i.e. of every defined ISO 8859-1 Latin1 character). If no
suitable letter forms are available, textual equivalents may
be used (such as "ss" in place of German sharp "s").
</p>
<h2 id="3.8.5.4.2">3.8.5.4.2</h2>
<p>Normally, and where sensibly possible, all
punctuation and letter characters in ISO 8859-1 Latin1
should be readable from the interpreter's keyboard. (However,
some interpreters may want to provide alternative keyboard
mappings, or to run in a different ISO 8859 set: Cyrillic,
for example.)
</p>
<h2 id="3.8.5.4.3">3.8.5.4.3</h2>
<p><strong>***</strong>
An interpreter is not required to have suitable
letter-forms for printing Unicode characters <strong>$0100</strong> to <strong>$FFFF</strong>.
(It may, if it chooses, allow the user to configure certain
fonts for certain Unicode ranges; but this is not required.)
If a Unicode character must be printed which an interpreter
has no letter-form for, a question mark should be printed
instead.
</p>
<h2 id="3.8.5.4.4">3.8.5.4.4</h2>
<p>The Z-machine is not required to handle complex Unicode formatting like
combining characters, bidirectional formatting and unusual line-wrapping
rules.
</p>
<p>In Versions other than 6, interpreters may either handle these features, or not,
in window 0. In window 1, and all version 6 windows, they should be ignored.
</p>
<h2 id="3.8.5.4.5">3.8.5.4.5</h2>
<p>Unicode characters U+0000 to U+001F and U+007F to U+009F are control codes,
and must not be used.
</p>
<h2 id="3.8.6">3.8.6</h2>
<p>ZSCII codes 252 to 254 are defined for input only:</p>
<pre>
252: menu click 253: mouse double-click 254: mouse single-click
</pre>
<p>Menu clicks are available only in Version 6. A single click, or the first click of a double-click, is passed
in as 254. The second click of a double-click is passed in as 253. In Versions 5 and later
it is recommended that an interpreter should only send code 254,
whether the mouse is clicked once or twice.
</p>
<h2 id="3.8.7">3.8.7</h2>
<p>ZSCII code 255 is undefined. (This value is needed in
the "terminating characters table" as a wildcard, indicating
"any Input-only character with code 128 or above." However,
it cannot itself be printed or read from the keyboard.)
</p>
<hr>
<table id="table1">
<caption><strong>Table 1: default Unicode translations (see <a href="sect03.html#3.8.5.3">S 3.8.5.3</a>)</strong></caption>
<tr> <th>ZSCII code (dec)</th> <th>Unicode code (hex)</th> <th>Name</th> <th>Character</th> <th>Textual Equivalent</th> </tr>
<tr><td> 155 </td><td> 0e4 </td><td> a-diaeresis </td><td>ä</td><td> ae </td></tr>
<tr><td> 156 </td><td> 0f6 </td><td> o-diaeresis </td><td>ö</td><td> oe </td></tr>
<tr><td> 157 </td><td> 0fc </td><td> u-diaeresis </td><td>ü</td><td> ue </td></tr>
<tr><td> 158 </td><td> 0c4 </td><td> A-diaeresis </td><td>Ä</td><td> Ae </td></tr>
<tr><td> 159 </td><td> 0d6 </td><td> O-diaeresis </td><td>Ö</td><td> Oe </td></tr>
<tr><td> 160 </td><td> 0dc </td><td> U-diaeresis </td><td>Ü</td><td> Ue </td></tr>
<tr><td> 161 </td><td> 0df </td><td> sz-ligature </td><td>ß</td><td> ss </td></tr>
<tr><td> 162 </td><td> 0bb </td><td> quotation </td><td>»</td><td> >> or " </td></tr>
<tr><td> 163 </td><td> 0ab </td><td> marks </td><td>«</td><td> << or " </td></tr>
<tr><td> 164 </td><td> 0eb </td><td> e-diaeresis </td><td>ë</td><td> e </td></tr>
<tr><td> 165 </td><td> 0ef </td><td> i-diaeresis </td><td>ï</td><td> i </td></tr>
<tr><td> 166 </td><td> 0ff </td><td> y-diaeresis </td><td>ÿ</td><td> y </td></tr>
<tr><td> 167 </td><td> 0cb </td><td> E-diaeresis </td><td>Ë</td><td> E </td></tr>
<tr><td> 168 </td><td> 0cf </td><td> I-diaeresis </td><td>Ï</td><td> I </td></tr>
<tr><td> 169 </td><td> 0e1 </td><td> a-acute </td><td>á</td><td> a </td></tr>
<tr><td> 170 </td><td> 0e9 </td><td> e-acute </td><td>é</td><td> e </td></tr>
<tr><td> 171 </td><td> 0ed </td><td> i-acute </td><td>í</td><td> i </td></tr>
<tr><td> 172 </td><td> 0f3 </td><td> o-acute </td><td>ó</td><td> o </td></tr>
<tr><td> 173 </td><td> 0fa </td><td> u-acute </td><td>ú</td><td> u </td></tr>
<tr><td> 174 </td><td> 0fd </td><td> y-acute </td><td>ý</td><td> y </td></tr>
<tr><td> 175 </td><td> 0c1 </td><td> A-acute </td><td>Á</td><td> A </td></tr>
<tr><td> 176 </td><td> 0c9 </td><td> E-acute </td><td>É</td><td> E </td></tr>
<tr><td> 177 </td><td> 0cd </td><td> I-acute </td><td>Í</td><td> I </td></tr>
<tr><td> 178 </td><td> 0d3 </td><td> O-acute </td><td>Ó</td><td> O </td></tr>
<tr><td> 179 </td><td> 0da </td><td> U-acute </td><td>Ú</td><td> U </td></tr>
<tr><td> 180 </td><td> 0dd </td><td> Y-acute </td><td>Ý</td><td> Y </td></tr>
<tr><td> 181 </td><td> 0e0 </td><td> a-grave </td><td>à</td><td> a </td></tr>
<tr><td> 182 </td><td> 0e8 </td><td> e-grave </td><td>è</td><td> e </td></tr>
<tr><td> 183 </td><td> 0ec </td><td> i-grave </td><td>ì</td><td> i </td></tr>
<tr><td> 184 </td><td> 0f2 </td><td> o-grave </td><td>ò</td><td> o </td></tr>
<tr><td> 185 </td><td> 0f9 </td><td> u-grave </td><td>ù</td><td> u </td></tr>
<tr><td> 186 </td><td> 0c0 </td><td> A-grave </td><td>À</td><td> A </td></tr>
<tr><td> 187 </td><td> 0c8 </td><td> E-grave </td><td>È</td><td> E </td></tr>
<tr><td> 188 </td><td> 0cc </td><td> I-grave </td><td>Ì</td><td> I </td></tr>
<tr><td> 189 </td><td> 0d2 </td><td> O-grave </td><td>Ò</td><td> O </td></tr>
<tr><td> 190 </td><td> 0d9 </td><td> U-grave </td><td>Ù</td><td> U </td></tr>
<tr><td> 191 </td><td> 0e2 </td><td> a-circumflex </td><td>â</td><td> a </td></tr>
<tr><td> 192 </td><td> 0ea </td><td> e-circumflex </td><td>ê</td><td> e </td></tr>
<tr><td> 193 </td><td> 0ee </td><td> i-circumflex </td><td>î</td><td> i </td></tr>
<tr><td> 194 </td><td> 0f4 </td><td> o-circumflex </td><td>ô</td><td> o </td></tr>
<tr><td> 195 </td><td> 0fb </td><td> u-circumflex </td><td>û</td><td> u </td></tr>
<tr><td> 196 </td><td> 0c2 </td><td> A-circumflex </td><td>Â</td><td> A </td></tr>
<tr><td> 197 </td><td> 0ca </td><td> E-circumflex </td><td>Ê</td><td> E </td></tr>
<tr><td> 198 </td><td> 0ce </td><td> I-circumflex </td><td>Î</td><td> I </td></tr>
<tr><td> 199 </td><td> 0d4 </td><td> O-circumflex </td><td>Ô</td><td> O </td></tr>
<tr><td> 200 </td><td> 0db </td><td> U-circumflex </td><td>Û</td><td> U </td></tr>
<tr><td> 201 </td><td> 0e5 </td><td> a-ring </td><td>å</td><td> a </td></tr>
<tr><td> 202 </td><td> 0c5 </td><td> A-ring </td><td>Å</td><td> A </td></tr>
<tr><td> 203 </td><td> 0f8 </td><td> o-slash </td><td>ø</td><td> o </td></tr>
<tr><td> 204 </td><td> 0d8 </td><td> O-slash </td><td>Ø</td><td> O </td></tr>
<tr><td> 205 </td><td> 0e3 </td><td> a-tilde </td><td>ã</td><td> a </td></tr>
<tr><td> 206 </td><td> 0f1 </td><td> n-tilde </td><td>ñ</td><td> n </td></tr>
<tr><td> 207 </td><td> 0f5 </td><td> o-tilde </td><td>õ</td><td> o </td></tr>
<tr><td> 208 </td><td> 0c3 </td><td> A-tilde </td><td>Ã</td><td> A </td></tr>
<tr><td> 209 </td><td> 0d1 </td><td> N-tilde </td><td>Ñ</td><td> N </td></tr>
<tr><td> 210 </td><td> 0d5 </td><td> O-tilde </td><td>Õ</td><td> O </td></tr>
<tr><td> 211 </td><td> 0e6 </td><td> ae-ligature </td><td>æ</td><td> ae </td></tr>
<tr><td> 212 </td><td> 0c6 </td><td> AE-ligature </td><td>Æ</td><td> AE </td></tr>
<tr><td> 213 </td><td> 0e7 </td><td> c-cedilla </td><td>ç</td><td> c </td></tr>
<tr><td> 214 </td><td> 0c7 </td><td> C-cedilla </td><td>Ç</td><td> C </td></tr>
<tr><td> 215 </td><td> 0fe </td><td> Icelandic thorn </td><td>þ</td><td> th </td></tr>
<tr><td> 216 </td><td> 0f0 </td><td> Icelandic eth </td><td>ð</td><td> th </td></tr>
<tr><td> 217 </td><td> 0de </td><td> Icelandic Thorn </td><td>Þ</td><td> Th </td></tr>
<tr><td> 218 </td><td> 0d0 </td><td> Icelandic Eth </td><td>Ð</td><td> Th </td></tr>
<tr><td> 219 </td><td> 0a3 </td><td> pound symbol </td><td>£</td><td> L </td></tr>
<tr><td> 220 </td><td> 153 </td><td> oe-ligature </td><td>œ</td><td> oe </td></tr>
<tr><td> 221 </td><td> 152 </td><td> OE-ligature </td><td>Œ</td><td> OE </td></tr>
<tr><td> 222 </td><td> 0a1 </td><td> inverted ! </td><td>¡</td><td> ! </td></tr>
<tr><td> 223 </td><td> 0bf </td><td> inverted ? </td><td>¿</td><td> ? </td></tr>
</table>
<hr>
<h2 id="remarks">Remarks</h2>
<p>In practice the text compression factor is not really very good: for
instance, 155000 characters of text squashes into 99000 bytes. (Text
usually accounts for about 75% of a story file.) Encoding does
at least encrypt the text so that casual browsers can't read it.
Well-chosen abbreviations will reduce total story file size by 10% or so.
</p>
<p>The German translation of 'Zork I' uses an alphabet table to make
accented letters (from the standard extra characters set) efficient
in dictionary words. In Version 6, 'Shogun' also uses an alphabet
table.
</p>
<p>Unicode translation tables are new in Standard 1.0: in Standard
0.2, the extra characters were always mapped using the default
Unicode translation table.
</p>
<p>Note that if a random stretch of memory is accidentally printed as
a string (due to an error in the story file), illegal ZSCII codes
may well be printed using the 4-Z-character escape sequence. It's
helpful for interpreters to filter out any such illegal codes
so that the resulting on-screen mess will not cause trouble for the
terminal (e.g. by causing the interpreter to print ASCII 12,
clear screen, or 7, bell sound).
</p>
<p>The continental European quotation marks <strong><<</strong> and <strong>>></strong> should have
spacing which looks sensible either in French style <strong><<</strong>Merci!<strong>>></strong>
or in German style <strong>>></strong>Danke!<strong><<</strong>.
</p>
<p>Ideally, an interpreter should be able to read time delays (for timed input)
from stream 1 (i.e., from a script file). See the <a href="sect07.html#remarks">
remarks</a> in <strong>§</strong> 7.
</p>
<p>The 'Beyond Zork' story file is capable of receiving both mouse-click
codes (253 and 254), listing both in its terminating characters table
and treating them equally.
</p>
<p>The extant Infocom games in Versions 4 and 5 use the control characters
1 to 31 only as follows: they all accept 10 or 13 as equivalent, except
that 'Bureaucracy' will only accept 13. 'Bureaucracy' needs either 127
or 8 to be a delete code. No other codes are used.
</p>
<p>Curiously, 'Nord 'n' Bert Couldn't Make Head Nor Tail Of It' and
'A Mind Forever Voyaging' allow some letter characters to be typed
in with the top bit set. That is, if reading an A, they would recognise
65 or 91 (upper or lower case) and also 193 or 219. Matthew Russotto
suggests this was an accommodation for the Apple II, whose keyboard
primitives returned the last key pressed in the bottom 7 bits of a
byte, plus a top bit flag indicating whether or not the keyboard had
been hit since last time.
</p>
<p>In Version 3 and later, many of Infocom's interpreters (and some subsequent
interpreters, such as ITF's) treat two consecutive Z-characters 4 or 5 as
shift locks, contrary to the Standard. As a result, story files should not
use multiple consecutive 4 or 5 codes except for padding at the end of
strings and dictionary words. In any case, these shift locks are not used in
dictionary words, or any of Infocom's story files.
</p>
<p>In the available Version 1 and 2 Infocom gamefiles, dictionary words that leave a
multi-Z-character construction incomplete due to the truncation to 6 Z-characters do not
have the end-bit of the last word set. However, the Infocom interpreters for these files
do not recognize this, indicating that it is most likely a bug.
</p>
<hr>
<p>In the past, not just in the Z-machine world, there has been general
confusion over the rendering of ASCII/ZSCII/Latin-1/Unicode characters $27
and $60. For the Z-machine, the traditional interpretations of
right-single-quote/apostrophe and left-single-quote are preferred over the
modern neutral-single-quote and grave accent - see Table 2A of the Inform
Designer's Manual. $22 is a neutral double-quote.
</p>
<p>An alternative rendering is to interpret both $27 and $60 as neutral quotes,
but interpreting $60 as a grave accent is to be avoided.
</p>
<p>No doubt aware of this confusion, Infocom never used character $60, and used
$27 almost exclusively as an apostrophe - hardly any single quotes appear in
Infocom games. Modern authors would do well to follow their lead.
</p>
<p>The few Infocom games that do use single quotes use $27 for both opening and
closing - but even on many of their interpreters this looked a little odd, so
suggesting that $27 be a right quote introduces no extra compatibility
problems.
</p>
<hr>
<p>
<a href="index.html">Contents</a> /
<a href="preface.html">Preface</a> /
<a href="overview.html">Overview</a>
</p>
<p>Section
<a href="sect01.html">1</a> / <a href="sect02.html">2</a> /
<a href="sect03.html">3</a> / <a href="sect04.html">4</a> /
<a href="sect05.html">5</a> / <a href="sect06.html">6</a> /
<a href="sect07.html">7</a> / <a href="sect08.html">8</a> /
<a href="sect09.html">9</a> / <a href="sect10.html">10</a> /
<a href="sect11.html">11</a> / <a href="sect12.html">12</a> /
<a href="sect13.html">13</a> / <a href="sect14.html">14</a> /
<a href="sect15.html">15</a> / <a href="sect16.html">16</a>
</p>
<p>Appendix
<a href="appa.html">A</a> / <a href="appb.html">B</a> /
<a href="appc.html">C</a> / <a href="appd.html">D</a> /
<a href="appe.html">E</a> / <a href="appf.html">F</a>
</p>
<hr>
</body>
</html>