Skip to content

FileTypes

dfgordon edited this page Aug 14, 2024 · 13 revisions

High Level File Types

The following are the high level file types defined by a2kit. Specifying one type or the other can affect how the data is packed in the allocation blocks and/or how the data is encoded. Applicability to specific file systems is noted at the end of each section.

Raw Files

The raw type treats a file's allocation blocks as a byte stream, with no interpretation at all. If the file is sparse, a2kit will concatenate the allocated blocks (sparse files can be handled fully using the any type). You can ask for the byte stream to be truncated at the EOF using the --trunc flag. The exact effect of --trunc depends on how metadata is stored in the file system's directory.

When a raw file is saved to a file system that maintains its own type codes (e.g. ProDOS), the file will be saved as text, regardless of contents. This can be changed after the fact using the retype subcommand.

The raw type works with any file system.

Binary Files

The a2kit type code for a binary file is bin.

Some file systems define a load-address. When working with binary files, a2kit passes only the data between pipeline nodes, i.e., the load-address is discarded. As a result, for some file systems, the load-address has to be specified anew for each node in the pipeline. You can pass the full information about a file through the pipeline using the any type, see the low level page.

The bin type works with any file system.

Raw vs. Binary

While bin and raw both work with binary data, the packing strategy is different. Refer to the following table for the various distinctions.

Property raw bin
get truncates at EOF with --trunc maybe always
get strips header never always
put sets EOF always always
put adds header never always
put maps to FS type text binary

Further details depend on the file system.

BASIC Language Files

BASIC is an interpreted language, but the representation on an Apple II disk is usually tokenized, i.e., keywords and other symbols are represented by a single byte. This byte is chosen to have no overlap with whatever text encoding is chosen. When processing BASIC files you have to know whether you are dealing with a "source" file (almost always this will be found on the local file system, and not on a disk image), or a tokenized file. The a2kit type codes are as follows:

BASIC format type code
Applesoft source atxt
Applesoft tokens atok
Integer source itxt
Integer tokens itok

It is important to realize a2kit will not automatically tokenize the source or detokenize the tokens. You have to insert a pipeline node if you want to do this. See Languages for examples.

Applesoft and Integer BASIC are specific to DOS and ProDOS.

Assembly Language Files

Some assembly language source files, notably Merlin, are "tokenized" in the sense that they are not simple ASCII. The a2kit type codes are as follows:

Assembler format type code
Merlin local source mtxt
Merlin disk image source mtok
Merlin assembled executable bin

Just as with BASIC program files, a2kit will not automatically "tokenize" local source files or "detokenize" source files taken from a disk image. You have to insert a pipeline node if you want to do this. See Languages for examples.

Merlin assembly language is specific to DOS and ProDOS.

Sequential Text

Text files on disk images may be encoded differently from those on the local file system. For example, DOS 3.3 text files are negative ASCII and use carriage returns as line separators. When using get and put the encoding is automatically converted if the file type is txt. If you want to preserve the original encoding, use the raw type. As an example,

a2kit get -f mytext -t txt -d img.dsk

will display readable text. On the other hand,

a2kit get -f mytext -t raw -d img.dsk --trunc

will display a hex dump. When forming a pipeline use txt unless you have a specific reason to use raw.

Sequential text works with any file system.

Random Access Text

Random access text files have to be manipulated with the aid of a JSON representation which is assigned the abstract file type rec (records). For example, to get a random access file from a ProDOS image use

a2kit get -f myrecords -t rec -d img.dsk

For DOS 3.3 the record length has to be given:

a2kit get -f myrecords -t rec -d img.dsk -l 127

This will display a JSON string representing the file. It might look something like this:

{
    "a2kit_type": "rec",
    "record_length": 127,
    "records": {
        "5": ["field1","field2"],
        "2": ["field"]
    }
}

Notice the record numbers do not have to be in any sequence or order. This structure can be passed along the pipeline, for example, you can copy the records from a DOS 3.3 disk to a ProDOS disk:

a2kit get -f myrecords -t rec -d img.dsk -l 127 | a2kit put -f myrecords -t rec -d img.po

If you want to put a local file as random access text, it must be in the JSON representation. N.b. this is different from the other basic file types where the source can be a simple binary or text file.

Random access text works with DOS and ProDOS. a2kit does not yet support CP/M random access text files. You can, however, use the any type to manipulate an arbitrary sparse file, including on CP/M.

Other Types

You can always use the retype subcommand to change one of the above types into any other type, e.g., you can change a binary file into a ProDOS system file. Of course the contents of the file must be consistent with the requirements of the type. If the block-wise storage pattern needs to be controlled you can use the any representation, see low level.

Clone this wiki locally