sect03.html

<!DOCTYPE html>

<html>
<head>
<title>The Z-Machine Standards Document</title>
<link rel="stylesheet" type="text/css" href="zspec.css">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
</head>

<body>

<img class="icon" src="images/icon03.gif" alt="">

<h1>3. How text and characters are encoded</h1>

<blockquote>
  <p>This technique is similar to the five-bit Baudot code, which
     was used by early Teletypes before ASCII was invented.
  </p>
  <p>Marc S. Blank and S. W. Galley, <strong>How to Fit a Large Program Into
     a Small Machine</strong>
  </p>
</blockquote>

<hr>

<p>3.1 <a href="#3.1">Text</a> / 
   3.2 <a href="#3.2">Alphabets</a> / 
   3.3 <a href="#3.3">Abbreviations</a> /
   3.4 <a href="#3.4">ZSCII escape</a> / 
   3.5 <a href="#3.5">Alphabet table</a> / 
   3.6 <a href="#3.6">Padding and incompleteness</a> / 
   3.7 <a href="#3.7">Dictionary truncation</a> / 
   3.8 <a href="#3.8">Definition of ZSCII and Unicode</a>
</p>

<hr>


<h2 id="3.1">3.1</h2>
<p>Z-machine text is a sequence of ZSCII character codes (ZSCII is
   a system similar to ASCII: see <a href="sect03.html#3.8"><strong>§</strong> 3.8</a> below).  These ZSCII values are
   encoded into memory using a string of Z-characters.  The process of
   converting between Z-characters and ZSCII values is given in <a href="sect03.html#3.2"><strong>§</strong> 3.2 to
   3.7</a> below.
</p>

<h2 id="3.2">3.2</h2>
<p>Text in memory consists of a sequence of 2-byte words.  Each word
   is divided into three 5-bit 'Z-characters', plus 1 bit left over, arranged
   as
</p>

<pre>
   --first byte-------   --second byte---
   7    6 5 4 3 2  1 0   7 6 5  4 3 2 1 0
   bit  --first--  --second---  --third--
</pre>

<p>The bit is set only on the last 2-byte word of the text, and so marks the
   end.
</p>

<h2 id="3.2.1">3.2.1</h2>
<p>There are three 'alphabets', A0 (lower case), A1 (upper case) and
   A2 (punctuation) and during printing one of these is current at any given
   time.  Initially A0 is current.  The meaning of a Z-character may depend on
   which alphabet is current.
</p>

<h2 id="3.2.2">3.2.2</h2>
<p>In Versions 1 and 2, the current alphabet can be any of the
   three.  The Z-characters 2 and 3 are called 'shift' characters and change
   the alphabet for the next character only.  The new alphabet depends on
   what the current one is:
</p>

<pre>
             from A0  from A1  from A2
  Z-char 2      A1       A2       A0
  Z-char 3      A2       A0       A1
</pre>

<p>Z-characters 4 and 5 permanently change alphabet, according to the
   same table, and are called 'shift lock' characters.
</p>

<h2 id="3.2.3">3.2.3</h2>
<p>In Versions 3 and later, the current alphabet is always A0
   unless changed for 1 character only: Z-characters 4 and 5 are shift
   characters.  Thus 4 means "the next character is in A1" and 5 means
   "the next is in A2".  There are no shift lock characters.
</p>

<h2 id="3.2.4">3.2.4</h2>
<p>An indefinite sequence of shift or shift lock characters is
   legal (but prints nothing).
</p>

<h2 id="3.3">3.3</h2>
<p>In Versions 3 and later, Z-characters 1, 2 and 3 represent
   abbreviations, sometimes also called 'synonyms' (for traditional reasons):
   the next Z-character indicates which abbreviation string to print.  If z
   is the first Z-character (1, 2 or 3) and x the subsequent one, then the
   interpreter must look up entry 32(z-1)+x in the abbreviations table and
   print the string at that word address.  In Version 2, Z-character 1 has this
   effect (but 2 and 3 do not, so there are only 32 abbreviations).
</p>
<h2 id="3.3.1">3.3.1</h2>
<p>Abbreviation string-printing follows all the rules of this section
   except that an abbreviation string must not itself use abbreviations and
   must not end with an incomplete multi-Z-character construction (see
   <a href="sect03.html#3.6.1"><strong>§</strong> 3.6.1</a> below).
</p>
<h2 id="3.4">3.4</h2>
<p>Z-character 6 from A2 is a 'ZSCII escape' character. It means that the two subsequent 
   Z-characters specify a ten-bit ZSCII character code: the next Z-character gives the top 
   5 bits and the one after the bottom 5.
</p>

<h2 id="3.5">3.5</h2>
<p>The remaining Z-characters are translated into ZSCII character
   codes using the "alphabet table".
</p>

<h2 id="3.5.1">3.5.1</h2>
<p>The Z-character 0 is printed as a space (ZSCII 32).</p>

<h2 id="3.5.2">3.5.2</h2>
<p>In Version 1, Z-character 1 is printed as a new-line
   (ZSCII 13).
</p>

<h2 id="3.5.3">3.5.3</h2>
<p>In Versions 2 to 4, the alphabet table for converting
   Z-characters into ZSCII character codes is as follows:
</p>

<pre>
   Z-char 6789abcdef0123456789abcdef
current   --------------------------
  A0      abcdefghijklmnopqrstuvwxyz
  A1      ABCDEFGHIJKLMNOPQRSTUVWXYZ
  A2       ^0123456789.,!?_#'"/\-:()
          --------------------------
</pre>
<p>For example, in alphabet A1 the Z-character 12 is translated as a capital G
   (ZSCII character code 71).
</p>

<p>Character 6 in A2 is printed as a space here, but it is a 'ZSCII escape' character, and not
   translated using the alphabet table: see <a href="sect03.html#3.4"><strong>§</strong> 3.4</a> 
   above. Character 7 in A2, written here as a circumflex ^, is a new-line (ZSCII 13).
</p>

<h2 id="3.5.4">3.5.4</h2>
<p>Version 1 has a slightly different A2 row in its alphabet
   table (new-line is not needed, making room for the <strong> &lt; </strong> character):
</p>

<pre>
          6789abcdef0123456789abcdef
          --------------------------
  A2       0123456789.,!?_#'"/\&lt;-:()
          --------------------------
</pre>

<h2 id="3.5.5">3.5.5</h2>
<p>In Versions 5 and later, the interpreter should look at
   the word at <strong>$34</strong> in the <a href="sect11.html">header</a>.  If this is zero, then the
   alphabet table drawn out in <a href="sect03.html#3.5.3"><strong>§</strong> 3.5.3</a> continues in use.  Otherwise
   it is interpreted as the byte address of an alphabet table
   specific to this story file.
</p>

<h2 id="3.5.5.1">3.5.5.1</h2>
<p>Such an alphabet table consists of 78 bytes arranged as 3 blocks of 26 ZSCII values, 
   translating Z-characters 6 to 31 for alphabets A0, A1 and A2. Z-characters 6 and 7 of 
   A2, however, are still translated as ZSCII escape and new-line (ZSCII 13) codes (as above).
</p>

<h2 id="3.6">3.6</h2>
<p>Since the end-bit only comes up once every three Z-characters,
   a string may have to be 'padded out' with null values.  This is
   conventionally achieved with a sequence of 5's, though a sequence of
   (for example) 4's would work equally well.
</p>

<h2 id="3.6.1">3.6.1</h2>
<p>It is legal for the string to end while a multi-Z-character
   construction is incomplete: for instance, after only the top half of
   an ASCII value has been given.  The partial construction is simply
   ignored.  (This can happen in printing dictionary words which have
   been guillotined to the dictionary resolution of 6 or 9 Z-characters.)
</p>

<h2 id="3.7">3.7</h2>
<p>When an interpreter is encrypting typed-in text to match against
   dictionary words, the following restrictions apply.  Text should be
   converted to lower case (as a result A1 will not be needed unless the
   game provides its own alphabet table).  Abbreviations may not be used.
   The pad character, if needed, must be 5.  The total string length must
   be 6 Z-characters (in Versions 1 to 3) or 9 (Versions 4 and later): any
   multi-Z-character constructions should be left incomplete (rather than
   omitted) if there's no room to finish them.  For example, "i" is
   encrypted as:
</p>

<p><strong>14, 5, 5, 5, 5, 5, 5, 5, 5</strong></p>

<p><strong>$48a5</strong> <strong>$14a5</strong> <strong>$94a5</strong></p>

<h2 id="3.7.1">3.7.1</h2>
<p>In Versions 1 and 2 only, when encoding text for dictionary words,
   shift-lock Z-characters 4 and 5 are used instead of the single-shift
   Z-characters 2 and 3 when the next two characters come from the same
   alphabet.
</p>


<h2 id="3.8">3.8</h2>
<p>The character set of the Z-machine is called ZSCII (Zork
   Standard Code for Information Interchange; pronounced to rhyme with
   "xyzzy").  ZSCII codes are 10-bit unsigned values between 0 and
   1023.  Story files may only legally use the values which are defined
   below.  Note that some values are defined only for input and some
   only for output.
</p>

<table>
  <caption><strong>Table 2: summary of the ZSCII rules</strong></caption>
  <tr> <td>0</td>        <td>null</td>                    <td>Output</td>       </tr>
  <tr> <td>1-7</td>      <td>----</td>                    <td></td>             </tr>
  <tr> <td>8</td>        <td>delete</td>                  <td>Input</td>        </tr>
  <tr> <td>9</td>        <td>tab (V6)</td>                <td>Output</td>       </tr>
  <tr> <td>10</td>       <td>----</td>                    <td></td>             </tr>
  <tr> <td>11</td>       <td>sentence space (V6)</td>     <td>Output</td>       </tr>
  <tr> <td>12</td>       <td>----</td>                    <td></td>             </tr>
  <tr> <td>13</td>       <td>newline</td>                 <td>Input/Output</td> </tr>
  <tr> <td>14-26</td>    <td>----</td>                    <td></td>             </tr>
  <tr> <td>27</td>       <td>escape</td>                  <td>Input</td>        </tr>
  <tr> <td>28-31</td>    <td>----</td>                    <td></td>             </tr>
  <tr> <td>32-126</td>   <td>standard ASCII</td>          <td>Input/Output</td> </tr>
  <tr> <td>127-128</td>  <td>----</td>                    <td></td>             </tr>
  <tr> <td>129-132</td>  <td>cursor u/d/l/r</td>          <td>Input</td>        </tr>
  <tr> <td>133-144</td>  <td>function keys f1 to f12</td> <td>Input</td>        </tr>
  <tr> <td>145-154</td>  <td>keypad 0 to 9</td>           <td>Input</td>        </tr>
  <tr> <td>155-251</td>  <td>extra characters</td>        <td>Input/Output</td> </tr>
  <tr> <td>252</td>      <td>menu click (V6)</td>         <td>Input</td>        </tr>
  <tr> <td>253</td>      <td>double-click</td>       <td>Input</td>        </tr>
  <tr> <td>254</td>      <td>single-click</td>            <td>Input</td>        </tr>
  <tr> <td>255-1023</td> <td>----</td>                    <td></td>             </tr>
</table>

<h2 id="3.8.1">3.8.1</h2>
<p>The codes 256 to 1023 are undefined, so that for all
   practical purposes ZSCII is an 8-bit unsigned code.
</p>

<h2 id="3.8.2">3.8.2</h2>
<p>The codes 0 to 31 are undefined except as follows:</p>

<h2 id="3.8.2.1">3.8.2.1</h2>
<p>ZSCII code 0 ("null") is defined for output
   but has no effect in any output stream.  (It is also used as
   a value meaning "no character" when reporting terminating
   character codes, but is not formally defined for input.)
</p>

<h2 id="3.8.2.2">3.8.2.2</h2>
<p>ZSCII code 8 ("delete") is defined for input only.</p>

<h2 id="3.8.2.3">3.8.2.3</h2>
<p>ZSCII code 9 ("tab") is defined for output in Version 6 only.
   At the start of a screen line this should print a paragraph indentation
   suitable for the font being used: if it is printed in the middle of a
   screen line, it should be converted to a space (Infocom's own
   interpreters do not do this, however).
</p>

<h2 id="3.8.2.4">3.8.2.4</h2>
<p>ZSCII code 11 ("sentence space") is defined for output in
   Version 6 only.  This should be printed as a suitable gap between two
   sentences (in the same way that typographers normally place larger spaces
   after the full stops ending sentences than after words or commas).
</p>

<h2 id="3.8.2.5">3.8.2.5</h2>
<p>ZSCII code 13 ("carriage return") is defined for input and
   output.
</p>

<h2 id="3.8.2.6">3.8.2.6</h2>
<p>ZSCII code 27 ("escape" or "break") is defined for input
   only.
</p>

<h2 id="3.8.3">3.8.3</h2>
<p>ZSCII codes between 32 ("space") and 126 ("tilde") are
   defined for input and output, and agree with standard ASCII (as well as
   all of the ISO 8859 character sets and Unicode).  Specifically:
</p>

<pre>
      0123456789abcdef0123456789abcdef
      --------------------------------
 $20   !"#$%&amp;'()*+,-./0123456789:;&lt;=&gt;?
 $40  @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
 $60  'abcdefghijklmnopqrstuvwxyz{!}~ 
      --------------------------------
</pre>

<p>Note that code <strong>$23</strong> (35 decimal) is a hash mark, not a pound sign.
   (Code <strong>$7c</strong> (124 decimal) is a vertical stroke which is shown as <strong>!</strong>
   here for typesetting reasons.)
</p>

<h2 id="3.8.3.1">3.8.3.1</h2>
<p>ZSCII codes 127 ("delete" in some forms of ASCII)
   and 128 are undefined.
</p>

<h2 id="3.8.4">3.8.4</h2>
<p> ZSCII codes 129 to 154 are defined for input only:</p>

<pre>
129: cursor up  130: cursor down  131: cursor left  132: cursor right
133: f1         134: f2           ....              144: f12
145: keypad 0   146: keypad 1     ....              154: keypad 9
</pre>

<h2 id="3.8.5">3.8.5</h2>
<p>The block of codes between 155 and 251 are the "extra
   characters" and are used differently by different story files.
   Some will need accented Latin characters (such as French E-acute),
   others unusual punctuation (Spanish question mark), others new
   alphabets (Cyrillic or Hebrew); still others may want dingbat
   characters, mathematical or musical symbols, and so on.
</p>

<h2 id="3.8.5.1">3.8.5.1</h2>
<p><strong>***</strong>
   To define which characters are required, the Unicode
   (or ISO 10646-1) Basic Multilingual Plane character set is used: characters are specified
   by unsigned 16-bit codes.  These values agree with ISO 8859 Latin-1
   in the range 0 to 255, and with ASCII and ZSCII in the range
   32 to 126.  The Unicode standard leaves a range of values, the
   Private Use Area, free: however, an Internet group called the
   ConScript Unicode Registry is organising a standard mapping of
   invented scripts (such as Klingon, or Tolkien's Elvish) into the
   Private Use Area, and this should be considered part of the Unicode
   standard for Z-machine purposes.
</p>

<p>The Z-machine does not provide access to non-BMP characters (ie characters
   outside the range U+0000 to U+FFFF).
</p>

<h2 id="3.8.5.2">3.8.5.2</h2>
<p><strong>***</strong>
   The story file chooses its stock of extra characters
   with a "Unicode translation table" as follows.  Under Versions
   1 to 4, the "default table" is always used (<a href="sect03.html#table1">see below</a>).
   In Version 5 or later, if Word 3 of the header extension table is
   present and non-zero then it is interpreted as the byte address
   of the Unicode translation table.  If Word 3 is absent or zero,
   the default table is used.
</p>

<h2 id="3.8.5.2.1">3.8.5.2.1</h2>
<p>The table consists of one byte giving a number N,
   followed by N two-byte words.
</p>

<h2 id="3.8.5.2.2">3.8.5.2.2</h2>
<p>This indicates that ZSCII characters 155 to 155+N-1
   are defined for both input and output.  (It's possible for N to
   be zero, leaving the whole range 155 to 251 undefined.)
</p>

<h2 id="3.8.5.2.3">3.8.5.2.3</h2>
<p>The words in the table give Unicode character codes
   for each of the ZSCII characters 155 to 155+N-1 in turn.
</p>

<h2 id="3.8.5.3">3.8.5.3</h2>
<p>The default table is as shown in <a href="sect03.html#table1">Table 1</a>.</p>

<h2 id="3.8.5.4">3.8.5.4</h2>
<p>The defined extra characters are entirely normal
   ZSCII characters.  They can appear in a story file's alphabet
   table, in an array created by print stream 3 and so on.
</p>

<h2 id="3.8.5.4.1">3.8.5.4.1</h2>
<p><strong>***</strong>
   The interpreter is required to be able to print
   representations of every defined Unicode character under <strong>$0100</strong>
   (i.e. of every defined ISO 8859-1 Latin1 character).  If no
   suitable letter forms are available, textual equivalents may
   be used (such as "ss" in place of German sharp "s").
</p>

<h2 id="3.8.5.4.2">3.8.5.4.2</h2>
<p>Normally, and where sensibly possible, all
   punctuation and letter characters in ISO 8859-1 Latin1
   should be readable from the interpreter's keyboard.  (However,
   some interpreters may want to provide alternative keyboard
   mappings, or to run in a different ISO 8859 set: Cyrillic,
   for example.)
</p>

<h2 id="3.8.5.4.3">3.8.5.4.3</h2>
<p><strong>***</strong>
   An interpreter is not required to have suitable
   letter-forms for printing Unicode characters <strong>$0100</strong> to <strong>$FFFF</strong>.
   (It may, if it chooses, allow the user to configure certain
   fonts for certain Unicode ranges; but this is not required.)
   If a Unicode character must be printed which an interpreter
   has no letter-form for, a question mark should be printed
   instead.
</p>

<h2 id="3.8.5.4.4">3.8.5.4.4</h2>
<p>The Z-machine is not required to handle complex Unicode formatting like
   combining characters, bidirectional formatting and unusual line-wrapping
   rules.
</p>

<p>In Versions other than 6, interpreters may either handle these features, or not,
   in window 0. In window 1, and all version 6 windows, they should be ignored.
</p>

<h2 id="3.8.5.4.5">3.8.5.4.5</h2>
<p>Unicode characters U+0000 to U+001F and U+007F to U+009F are control codes,
   and must not be used.
</p>

<h2 id="3.8.6">3.8.6</h2>
<p>ZSCII codes 252 to 254 are defined for input only:</p>

<pre>
252: menu click   253: mouse double-click   254: mouse single-click
</pre>

<p>Menu clicks are available only in Version 6. A single click, or the first click of a double-click, is passed
   in as 254. The second click of a double-click is passed in as 253. In Versions 5 and later
   it is recommended that an interpreter should only send code 254,
   whether the mouse is clicked once or twice.
</p>

<h2 id="3.8.7">3.8.7</h2>
<p>ZSCII code 255 is undefined.  (This value is needed in
   the "terminating characters table" as a wildcard, indicating
   "any Input-only character with code 128 or above."  However,
   it cannot itself be printed or read from the keyboard.)
</p>

<hr>

<table id="table1">
<caption><strong>Table 1: default Unicode translations (see <a href="sect03.html#3.8.5.3">S 3.8.5.3</a>)</strong></caption>

<tr> <th>ZSCII code (dec)</th> <th>Unicode code (hex)</th> <th>Name</th> <th>Character</th> <th>Textual Equivalent</th> </tr>

<tr><td>	155	</td><td>	0e4 	</td><td>	a-diaeresis 		</td><td>&#xe4;</td><td>	ae 				</td></tr>
<tr><td>	156	</td><td>	0f6 	</td><td>	o-diaeresis 		</td><td>&#xf6;</td><td>	oe 				</td></tr>
<tr><td>	157	</td><td>	0fc 	</td><td>	u-diaeresis 		</td><td>&#xfc;</td><td>	ue 				</td></tr>
<tr><td>	158	</td><td>	0c4 	</td><td>	A-diaeresis 		</td><td>&#xc4;</td><td>	Ae 				</td></tr>
<tr><td>	159	</td><td>	0d6 	</td><td>	O-diaeresis 		</td><td>&#xd6;</td><td>	Oe 				</td></tr>
<tr><td>	160	</td><td>	0dc 	</td><td>	U-diaeresis 		</td><td>&#xdc;</td><td>	Ue 				</td></tr>
<tr><td>	161	</td><td>	0df 	</td><td>	sz-ligature 		</td><td>&#xdf;</td><td>	ss 				</td></tr>
<tr><td>	162	</td><td>	0bb 	</td><td>	quotation 			</td><td>&#xbb;</td><td>	&gt;&gt; or " 	</td></tr>
<tr><td>	163	</td><td>	0ab 	</td><td>	marks 				</td><td>&#xab;</td><td>	&lt;&lt; or " 	</td></tr>
<tr><td>	164	</td><td>	0eb 	</td><td>	e-diaeresis 		</td><td>&#xeb;</td><td>	e 				</td></tr>
<tr><td>	165	</td><td>	0ef 	</td><td>	i-diaeresis 		</td><td>&#xef;</td><td>	i 				</td></tr>
<tr><td>	166	</td><td>	0ff 	</td><td>	y-diaeresis 		</td><td>&#xff;</td><td>	y 				</td></tr>
<tr><td>	167	</td><td>	0cb 	</td><td>	E-diaeresis 		</td><td>&#xcb;</td><td>	E 				</td></tr>
<tr><td>	168	</td><td>	0cf 	</td><td>	I-diaeresis 		</td><td>&#xcf;</td><td>	I 				</td></tr>
<tr><td>	169	</td><td>	0e1 	</td><td>	a-acute 			</td><td>&#xe1;</td><td>	a 				</td></tr>
<tr><td>	170	</td><td>	0e9 	</td><td>	e-acute 			</td><td>&#xe9;</td><td>	e 				</td></tr>
<tr><td>	171	</td><td>	0ed 	</td><td>	i-acute 			</td><td>&#xed;</td><td>	i 				</td></tr>
<tr><td>	172	</td><td>	0f3 	</td><td>	o-acute 			</td><td>&#xf3;</td><td>	o 				</td></tr>
<tr><td>	173	</td><td>	0fa 	</td><td>	u-acute 			</td><td>&#xfa;</td><td>	u 				</td></tr>
<tr><td>	174	</td><td>	0fd 	</td><td>	y-acute 			</td><td>&#xfd;</td><td>	y 				</td></tr>
<tr><td>	175	</td><td>	0c1 	</td><td>	A-acute 			</td><td>&#xc1;</td><td>	A 				</td></tr>
<tr><td>	176	</td><td>	0c9 	</td><td>	E-acute 			</td><td>&#xc9;</td><td>	E 				</td></tr>
<tr><td>	177	</td><td>	0cd 	</td><td>	I-acute 			</td><td>&#xcd;</td><td>	I 				</td></tr>
<tr><td>	178	</td><td>	0d3 	</td><td>	O-acute 			</td><td>&#xd3;</td><td>	O 				</td></tr>
<tr><td>	179	</td><td>	0da 	</td><td>	U-acute 			</td><td>&#xda;</td><td>	U 				</td></tr>
<tr><td>	180	</td><td>	0dd 	</td><td>	Y-acute 			</td><td>&#xdd;</td><td>	Y 				</td></tr>
<tr><td>	181	</td><td>	0e0 	</td><td>	a-grave 			</td><td>&#xe0;</td><td>	a 				</td></tr>
<tr><td>	182	</td><td>	0e8 	</td><td>	e-grave 			</td><td>&#xe8;</td><td>	e 				</td></tr>
<tr><td>	183	</td><td>	0ec 	</td><td>	i-grave 			</td><td>&#xec;</td><td>	i 				</td></tr>
<tr><td>	184	</td><td>	0f2 	</td><td>	o-grave 			</td><td>&#xf2;</td><td>	o 				</td></tr>
<tr><td>	185	</td><td>	0f9 	</td><td>	u-grave 			</td><td>&#xf9;</td><td>	u 				</td></tr>
<tr><td>	186	</td><td>	0c0 	</td><td>	A-grave 			</td><td>&#xc0;</td><td>	A 				</td></tr>
<tr><td>	187	</td><td>	0c8 	</td><td>	E-grave 			</td><td>&#xc8;</td><td>	E 				</td></tr>
<tr><td>	188	</td><td>	0cc 	</td><td>	I-grave 			</td><td>&#xcc;</td><td>	I 				</td></tr>
<tr><td>	189	</td><td>	0d2 	</td><td>	O-grave 			</td><td>&#xd2;</td><td>	O 				</td></tr>
<tr><td>	190	</td><td>	0d9 	</td><td>	U-grave 			</td><td>&#xd9;</td><td>	U 				</td></tr>
<tr><td>	191	</td><td>	0e2 	</td><td>	a-circumflex 		</td><td>&#xe2;</td><td>	a				</td></tr>
<tr><td>	192	</td><td>	0ea 	</td><td>	e-circumflex 		</td><td>&#xea;</td><td>	e				</td></tr>
<tr><td>	193	</td><td>	0ee 	</td><td>	i-circumflex 		</td><td>&#xee;</td><td>	i				</td></tr>
<tr><td>	194	</td><td>	0f4 	</td><td>	o-circumflex 		</td><td>&#xf4;</td><td>	o				</td></tr>
<tr><td>	195	</td><td>	0fb 	</td><td>	u-circumflex 		</td><td>&#xfb;</td><td>	u				</td></tr>
<tr><td>	196	</td><td>	0c2 	</td><td>	A-circumflex 		</td><td>&#xc2;</td><td>	A				</td></tr>
<tr><td>	197	</td><td>	0ca 	</td><td>	E-circumflex 		</td><td>&#xca;</td><td>	E				</td></tr>
<tr><td>	198	</td><td>	0ce 	</td><td>	I-circumflex 		</td><td>&#xce;</td><td>	I				</td></tr>
<tr><td>	199	</td><td>	0d4 	</td><td>	O-circumflex 		</td><td>&#xd4;</td><td>	O				</td></tr>
<tr><td>	200	</td><td>	0db 	</td><td>	U-circumflex 		</td><td>&#xdb;</td><td>	U				</td></tr>
<tr><td>	201	</td><td>	0e5 	</td><td>	a-ring 				</td><td>&#xe5;</td><td>	a				</td></tr>
<tr><td>	202	</td><td>	0c5 	</td><td>	A-ring 				</td><td>&#xc5;</td><td>	A				</td></tr>
<tr><td>	203	</td><td>	0f8 	</td><td>	o-slash 			</td><td>&#xf8;</td><td>	o				</td></tr>
<tr><td>	204	</td><td>	0d8 	</td><td>	O-slash 			</td><td>&#xd8;</td><td>	O				</td></tr>
<tr><td>	205	</td><td>	0e3 	</td><td>	a-tilde 			</td><td>&#xe3;</td><td>	a				</td></tr>
<tr><td>	206	</td><td>	0f1 	</td><td>	n-tilde 			</td><td>&#xf1;</td><td>	n				</td></tr>
<tr><td>	207	</td><td>	0f5 	</td><td>	o-tilde 			</td><td>&#xf5;</td><td>	o				</td></tr>
<tr><td>	208	</td><td>	0c3 	</td><td>	A-tilde 			</td><td>&#xc3;</td><td>	A				</td></tr>
<tr><td>	209	</td><td>	0d1 	</td><td>	N-tilde 			</td><td>&#xd1;</td><td>	N				</td></tr>
<tr><td>	210	</td><td>	0d5 	</td><td>	O-tilde 			</td><td>&#xd5;</td><td>	O				</td></tr>
<tr><td>	211	</td><td>	0e6 	</td><td>	ae-ligature 		</td><td>&#xe6;</td><td>	ae				</td></tr>
<tr><td>	212	</td><td>	0c6 	</td><td>	AE-ligature 		</td><td>&#xc6;</td><td>	AE				</td></tr>
<tr><td>	213	</td><td>	0e7 	</td><td>	c-cedilla 			</td><td>&#xe7;</td><td>	c				</td></tr>
<tr><td>	214	</td><td>	0c7 	</td><td>	C-cedilla 			</td><td>&#xc7;</td><td>	C				</td></tr>
<tr><td>	215	</td><td>	0fe 	</td><td>	Icelandic thorn 	</td><td>&#xfe;</td><td>	th				</td></tr>
<tr><td>	216	</td><td>	0f0 	</td><td>	Icelandic eth 		</td><td>&#xf0;</td><td>	th				</td></tr>
<tr><td>	217	</td><td>	0de 	</td><td>	Icelandic Thorn 	</td><td>&#xde;</td><td>	Th				</td></tr>
<tr><td>	218	</td><td>	0d0 	</td><td>	Icelandic Eth 		</td><td>&#xd0;</td><td>	Th				</td></tr>
<tr><td>	219	</td><td>	0a3 	</td><td>	pound symbol 		</td><td>&#xa3;</td><td>	L				</td></tr>
<tr><td>	220	</td><td>	153		</td><td>	oe-ligature 		</td><td>&#x153;</td><td>	oe				</td></tr>
<tr><td>	221	</td><td>	152		</td><td>	OE-ligature 		</td><td>&#x152;</td><td>	OE				</td></tr>
<tr><td>	222	</td><td>	0a1 	</td><td>	inverted ! 			</td><td>&#xa1;</td><td>	!				</td></tr>
<tr><td>	223	</td><td>	0bf 	</td><td>	inverted ? 			</td><td>&#xbf;</td><td>	?				</td></tr>
</table>


<hr>

<h2 id="remarks">Remarks</h2>

<p>In practice the text compression factor is not really very good: for
   instance, 155000 characters of text squashes into 99000 bytes.  (Text
   usually accounts for about 75% of a story file.)  Encoding does
   at least encrypt the text so that casual browsers can't read it.
   Well-chosen abbreviations will reduce total story file size by 10% or so.
</p>

<p>The German translation of 'Zork I' uses an alphabet table to make
   accented letters (from the standard extra characters set) efficient
   in dictionary words.  In Version 6, 'Shogun' also uses an alphabet
   table.
</p>

<p>Unicode translation tables are new in Standard 1.0: in Standard
   0.2, the extra characters were always mapped using the default
   Unicode translation table.
</p>

<p>Note that if a random stretch of memory is accidentally printed as
   a string (due to an error in the story file), illegal ZSCII codes
   may well be printed using the 4-Z-character escape sequence.  It's
   helpful for interpreters to filter out any such illegal codes
   so that the resulting on-screen mess will not cause trouble for the
   terminal (e.g. by causing the interpreter to print ASCII 12,
   clear screen, or 7, bell sound).
</p>

<p>The continental European quotation marks <strong>&lt;&lt;</strong> and <strong>&gt;&gt;</strong> should have
   spacing which looks sensible either in French style <strong>&lt;&lt;</strong>Merci!<strong>&gt;&gt;</strong>
   or in German style <strong>&gt;&gt;</strong>Danke!<strong>&lt;&lt;</strong>.
</p>

<p>Ideally, an interpreter should be able to read time delays (for timed input)
   from stream 1 (i.e., from a script file).  See the <a href="sect07.html#remarks">
   remarks</a> in <strong>§</strong> 7.
</p>

<p>The 'Beyond Zork' story file is capable of receiving both mouse-click
   codes (253 and 254), listing both in its terminating characters table
   and treating them equally.
</p>

<p>The extant Infocom games in Versions 4 and 5 use the control characters
   1 to 31 only as follows: they all accept 10 or 13 as equivalent, except
   that 'Bureaucracy' will only accept 13.  'Bureaucracy' needs either 127
   or 8 to be a delete code.  No other codes are used.
</p>

<p>Curiously, 'Nord 'n' Bert Couldn't Make Head Nor Tail Of It' and
   'A Mind Forever Voyaging' allow some letter characters to be typed
   in with the top bit set.  That is, if reading an A, they would recognise
   65 or 91 (upper or lower case) and also 193 or 219.  Matthew Russotto
   suggests this was an accommodation for the Apple II, whose keyboard
   primitives returned the last key pressed in the bottom 7 bits of a
   byte, plus a top bit flag indicating whether or not the keyboard had
   been hit since last time.
</p>

<p>In Version 3 and later, many of Infocom's interpreters (and some subsequent
   interpreters, such as ITF's) treat two consecutive Z-characters 4 or 5 as
   shift locks, contrary to the Standard. As a result, story files should not
   use multiple consecutive 4 or 5 codes except for padding at the end of
   strings and dictionary words. In any case, these shift locks are not used in
   dictionary words, or any of Infocom's story files.
</p>

<p>In the available Version 1 and 2 Infocom gamefiles, dictionary words that leave a
   multi-Z-character construction incomplete due to the truncation to 6 Z-characters do not 
   have the end-bit of the last word set. However, the Infocom interpreters for these files 
   do not recognize this, indicating that it is most likely a bug.
</p>

<hr>

<p>In the past, not just in the Z-machine world, there has been general
   confusion over the rendering of ASCII/ZSCII/Latin-1/Unicode characters $27
   and $60. For the Z-machine, the traditional interpretations of
   right-single-quote/apostrophe and left-single-quote are preferred over the
   modern neutral-single-quote and grave accent - see Table 2A of the Inform
   Designer's Manual. $22 is a neutral double-quote.
</p>

<p>An alternative rendering is to interpret both $27 and $60 as neutral quotes,
   but interpreting $60 as a grave accent is to be avoided.
</p>

<p>No doubt aware of this confusion, Infocom never used character $60, and used
   $27 almost exclusively as an apostrophe - hardly any single quotes appear in
   Infocom games. Modern authors would do well to follow their lead.
</p>

<p>The few Infocom games that do use single quotes use $27 for both opening and
   closing - but even on many of their interpreters this looked a little odd, so
   suggesting that $27 be a right quote introduces no extra compatibility
   problems.
</p>

<hr>

<p>
  <a href="index.html">Contents</a> / 
  <a href="preface.html">Preface</a> /
  <a href="overview.html">Overview</a>
</p>

<p>Section
  <a href="sect01.html">1</a> / <a href="sect02.html">2</a> /
  <a href="sect03.html">3</a> / <a href="sect04.html">4</a> /
  <a href="sect05.html">5</a> / <a href="sect06.html">6</a> /
  <a href="sect07.html">7</a> / <a href="sect08.html">8</a> /
  <a href="sect09.html">9</a> / <a href="sect10.html">10</a> /
  <a href="sect11.html">11</a> / <a href="sect12.html">12</a> /
  <a href="sect13.html">13</a> / <a href="sect14.html">14</a> /
  <a href="sect15.html">15</a> / <a href="sect16.html">16</a>
</p>

<p>Appendix
  <a href="appa.html">A</a> / <a href="appb.html">B</a> /
  <a href="appc.html">C</a> / <a href="appd.html">D</a> /
  <a href="appe.html">E</a> / <a href="appf.html">F</a>
</p>

<hr>

</body>
</html>