Skip to content

Latest commit

 

History

History
73 lines (48 loc) · 4.9 KB

stream_api.adoc

File metadata and controls

73 lines (48 loc) · 4.9 KB

Kaitai Stream API

All source files in supported languages generated by a Kaitai Struct compiler have a goal to be human-readable, thus they utilize an extra layer of stream API. This API is followed by Kaitai Struct runtime libraries:

Obviously, languages differ and thus API has slight differences, but in the nutshell, the general idea is the same. Runtime library provides a class (or collection of operations) KaitaiStream, which is essentially a wrapper over language’s native standard IO libraries. It features:

  • opening both local file input streams (if applicable) and in-memory input streams for reading in a single API

  • basic stream positioning operations (usually implemented as pass-through to stdlibs' API)

  • operations to read primitive KS types

  • processing operations to aid conversion of byte arrays into their unpacked / decrypted / deobfuscated forms

Names of operations below are given in Kaitai Struct native standard, i.e. lower underscore case. Real-life runtime libraries adapt these names to suit target languages coding style standards, i.e. read_u2be becomes readU2be in Java, or ReadU2be in C#.

Stream positioning

KS works always with seekable streams using the following 3 operations:

  • eof - checks if we’ve reached end-of-stream and returns true if we did

  • "reaching end-of-stream" is defined being in a position where requesting of reading any single byte would result in reporting an end-of-stream error, not as in C++ istream semantics

  • seek(n) - seeks to absolute byte position n in a stream

  • pos - returns current position in a stream in bytes

Reading

All reading operations are supposed to "report an error" if they are unable to read requested piece of data. Means of "reporting an error" depend of target language, but generally throwing a typical stdlibs exception (EOFException or something like that) is preferred. The only exception for this is when a method includes eos_error parameter and it is set to false - in this case, the method is excepted to return "best effort" read result.

Integers

One can read integers using one of read_$S$L$E operations, where:

  • $S is either u if we want to read unsigned integer or s if we want signed one;

  • $L is length of integer type in bytes. 1, 2, 4 and 8 bytes are supported;

  • $E is [endianness](https://en.wikipedia.org/wiki/Endianness) (order of bytes): l for little-endian or b for big-endian;

A few examples:

  • read_u8le - reads 8-byte (64-bit) unsigned integer, little-endian (AKA Intel, AKA VAX, etc)

  • read_s2be - reads 2-byte (16-bit) signed integer, big-endian (AKA "network byte order", AKA Power, AKA Motorola, etc)

  • read_u1 - reads 1-byte unsigned integer - no endianness is given as it’s pointless to do so

Basically, it’s the same designation as used in the type clause in .ksy format.

Byte arrays

There are 2 ways to read raw binary data as byte arrays:

  • read_bytes(n) - reads exactly n bytes from a stream; if there are less than n bytes read before hitting end-of-stream, then it reports an error

  • read_bytes_full - reads all remaining bytes from a stream

Strings

  • read_str_eos(String encoding)

  • read_str_byte_limit(long len, String encoding)

  • read_strz(String encoding, int term, boolean includeTerm, boolean consumeTerm, boolean eosError)

Processing

These methods implement process: …​ functionality for attributes, which basically takes a byte array and transforms it into another byte array, performing some operation usually associated with compression / encoding / encryption / obfuscation algorithms. Sometimes extra parameters are passed to these algorithms.

Note that generally these methods do not work with the stream, but get an in-memory buffer to work with, so they should be preferably implemented as static methods (or class methods, or the closest equivalent).

  • process_xor(data, key)

  • key may be a single byte or a byte array; if the language doesn’t allow 2 methods of the same name with different type signatures, it is preferred to implement 2 methods with distinct names: process_xor_one for single byte key and process_xor_many for byte array key

  • process_rotate_left(data, amount, group_size)

  • process_zlib(data)