-
Notifications
You must be signed in to change notification settings - Fork 1
FileTypes
The following are the high level file types defined by a2kit
. Specifying one type or the other can affect how the data is packed in the allocation blocks and/or how the data is encoded. Applicability to specific file systems is noted at the end of each section.
The raw
type treats a file's allocation blocks as a byte stream, with no interpretation at all. If the file is sparse, a2kit
will concatenate the allocated blocks (sparse files can be handled fully using the any
type). You can ask for the byte stream to be truncated at the EOF using the --trunc
flag. The exact effect of --trunc
depends on how metadata is stored in the file system's directory.
When a raw
file is saved to a file system that maintains its own type codes (e.g. ProDOS), the file will be saved as text, regardless of contents. This can be changed after the fact using the retype
subcommand.
The raw
type works with any file system.
The a2kit
type code for a binary file is bin
.
Some file systems define a load-address. When working with binary files, a2kit
passes only the data between pipeline nodes, i.e., the load-address is discarded. As a result, for some file systems, the load-address has to be specified anew for each node in the pipeline. You can pass the full information about a file through the pipeline using the any
type, see the low level page.
The bin
type works with any file system.
While bin
and raw
both work with binary data, the packing strategy is different. Refer to the following table for the various distinctions.
Property | raw | bin |
---|---|---|
get truncates at EOF |
with --trunc maybe |
always |
get strips header |
never | always |
put sets EOF |
always | always |
put adds header |
never | always |
put maps to FS type |
text | binary |
Further details depend on the file system.
BASIC is an interpreted language, but the representation on an Apple II disk is usually tokenized, i.e., keywords and other symbols are represented by a single byte. This byte is chosen to have no overlap with whatever text encoding is chosen. When processing BASIC files you have to know whether you are dealing with a "source" file (almost always this will be found on the local file system, and not on a disk image), or a tokenized file. The a2kit
type codes are as follows:
BASIC | format | type code |
---|---|---|
Applesoft | source | atxt |
Applesoft | tokens | atok |
Integer | source | itxt |
Integer | tokens | itok |
It is important to realize a2kit
will not automatically tokenize the source or detokenize the tokens. You have to insert a pipeline node if you want to do this. See Languages for examples.
Applesoft and Integer BASIC are specific to DOS and ProDOS.
Some assembly language source files, notably Merlin, are "tokenized" in the sense that they are not simple ASCII. The a2kit
type codes are as follows:
Assembler | format | type code |
---|---|---|
Merlin | local source | mtxt |
Merlin | disk image source | mtok |
Merlin | assembled executable | bin |
Just as with BASIC program files, a2kit
will not automatically "tokenize" local source files or "detokenize" source files taken from a disk image. You have to insert a pipeline node if you want to do this. See Languages for examples.
Merlin assembly language is specific to DOS and ProDOS.
Text files on disk images may be encoded differently from those on the local file system. For example, DOS 3.3 text files are negative ASCII and use carriage returns as line separators. When using get
and put
the encoding is automatically converted if the file type is txt
. If you want to preserve the original encoding, use the raw
type. As an example,
a2kit get -f mytext -t txt -d img.dsk
will display readable text. On the other hand,
a2kit get -f mytext -t raw -d img.dsk --trunc
will display a hex dump. When forming a pipeline use txt
unless you have a specific reason to use raw
.
Sequential text works with any file system.
Random access text files have to be manipulated with the aid of a JSON representation which is assigned the abstract file type rec
(records). For example, to get a random access file from a ProDOS image use
a2kit get -f myrecords -t rec -d img.dsk
For DOS 3.3 the record length has to be given:
a2kit get -f myrecords -t rec -d img.dsk -l 127
This will display a JSON string representing the file. It might look something like this:
{
"a2kit_type": "rec",
"record_length": 127,
"records": {
"5": ["field1","field2"],
"2": ["field"]
}
}
Notice the record numbers do not have to be in any sequence or order. This structure can be passed along the pipeline, for example, you can copy the records from a DOS 3.3 disk to a ProDOS disk:
a2kit get -f myrecords -t rec -d img.dsk -l 127 | a2kit put -f myrecords -t rec -d img.po
If you want to put
a local file as random access text, it must be in the JSON representation. N.b. this is different from the other basic file types where the source can be a simple binary or text file.
Random access text works with DOS and ProDOS. a2kit
does not yet support CP/M random access text files. You can, however, use the any
type to manipulate an arbitrary sparse file, including on CP/M.
You can always use the retype
subcommand to change one of the above types into any other type, e.g., you can change a binary file into a ProDOS system file. Of course the contents of the file must be consistent with the requirements of the type. If the block-wise storage pattern needs to be controlled you can use the any
representation, see low level.