PixBin is a binary format that leverages the flexibility of JSON serialization and the speed of Javascript's typed arrays.
In order to be serialize an object into the PixBin format, it must contain:
- a
_data
attribute, no matter what it contains - a
_metadata
attribute, no matter what it contains
Notice: since this format is intended to encode numerical data like pixel arrays or position arrays, the encoding of the _data
object will be optimized in the following cases:
_data
is a typed array. There is a single stream to encode (case 1).
Example:
// case 1, optimization OK
var _data = new Float32Array(1000);
_data
is a mixedArray
, containing typed arrays and possibly other things. There are several streams to encode (case 2).
Example:
// case 2, optimization OK
var _data = [
new Float32Array(1000),
new Uint16Array(500),
new Int8Array(8000)
];
Each element that is a typed array will be encoded natively and complex objects will be serialized.
_data
is anObject
(or{}
), then it will be serialized (see Object serialization). There is a single stream to encode (case 3).
Example:
// case 3, optimization not OK :(
var _data = {
first: "John",
last: "Doe",
samples: new Array(1000)
};
The case 1 and 2 are the best for storing numerical information and the case 3 is good for storing object kind.
Case3 notice:
- If you decide to store numerical data in an attribute of
_data
, useArray
rather thantyped arrays
. Using any of the typed arrays (Uint8Array
,Uint16Array
,Float32Array
, etc. ) in case 3 will be followed by an automatic conversion into regularArrays
. Even though the conversion is fast for small to medium length arrays, it's obviously longer than not having to convert. - Know that you would be limited to a maximum
_data
size of 65kBytes due to serialization limitation. If you have a big numerical dataset and a big object-based dataset it is better to store it as two separate block as in case 1 or use a mixed array as in case 2. - If the
_data
object contains circular reference, they will be removed. (this is also true for_metadata
in case 1, 2 and 3);
Here is what a PixBin file looks like in the end
As a binary file format, the goal of the encoding is to transform every chunk of information into ArrayBuffers
. Then, the concatenation of all these buffers into a bigger one can easily be written in a file.
The concept of serializing data is to transform an object or a complex data structure into a linear buffer that is easy to write or stream. Fortunately, Javascript provides a universal format for that: JSON. But remember: PixBin is a binary format, not a text-based format, this means a JSON string is not enough and needs an additional step to be encapsulated into a pixb file. In order to store special characters (accentuated letter, non-latin characters, symbols and emoji) in the binary representation of _metadata
or _data
, buffers have to support unicode (2 bytes per character, while ASCII is only 1 bytes per character).
In this document, when we are mentioning "object serialization", it means:
- converting an object to a JSON string
- don't forget we'll have to allocate 2 bytes per character
- write every unicode charcode of the JSON string into a
Uint16Array
Info: If you want to use such serialization outside of PixBin codec, you can have a look at CodecUtils, a small and handy toolbox for coding and decoding things. The method you may be interested in are objectToArrayBuffer( obj )
and ArrayBufferToObject( buff )
.
It contains:
- a PixBin primer (blue)
- a PixBin header (yellow)
- one or more blocks (green)
The structure of each block does not rely on how they are stored within the PixBin or at what position.
- 14 bytes, 14xUint8 to encode the magic number which is the ASCII string
PIXPIPE_PIXBIN
. If read as ASCII charcodes, the sequence is:80 73 88 80 73 80 69 95 80 73 88 66 73 78
. - 1 byte, 1xUint8 to specify endianness. 0: big endian, 1: little endian
- 4 bytes, 1xUint32 to specify the byte length of the bin header
Before being serialized, the header is a JS object with attributes and values:
{
// The date of creation
date: [Date],
// The app that created it. Default: "pixbincodec_js" but can be changed
createdWith: [String],
// A description the user can add (optional, default: null)
description: [String],
// A JS object with further information (optional, default: null)
userObject: this._options.userObject,
// Array of block information. One element per block in the bin
pixblocksInfo: []
}
pixblocksInfo
is an Array. There are as many elements in this array as there are PixBlocks in the the PixBin. Preserving the same order, each element in pixblocksInfo
is a quick overview of what is inside a PixBlock.
{
// String directly from constructor.name of the data encoded as a block.
// This is not used for reconstruction since the same info is present in the block metadata,
// but this can be useful in a index to know what kind of data is here without having to decode it
type: [String],
// If a block _metadata object has an attribute "description", then is copied here. (default: null)
description: [String]
// The length of the block in number of bytes
byteLength: [Number],
// The md5 checksum generated at encoding. Handy to check if file is corrupted
checksum: [String],
}
The PixBin header is then serialized to be turned into an ArrayBuffer and will be stuck after the PixBin primer.
As you can see, pixblocksInfo
does not say where in the PixBin buffer start each PixBlock in term of byte offset. Though, it tells about the byte length of each of them, and since the blocks are arranged in the same order as the elements in pixblocksInfo
, we can easily sum the byteLength
to find the byte offset, knowing it starts just after the PixBin header buffer.
The structure of a block is as follow:
- 1 byte, 1xUint8 to specify endianness. 0: big endian, 1: little endian
- 4 bytes, 1xUint32 to specify the byte length of the block header
The primer fits in a 5-bytes ArrayBuffer and it will be the very first sequence of our block (you better know the endianness before starting fetching data, right?)
Is a buffer of n bytes, that contains n/2 unicode characters (each are coded on 2 bytes). Once decoded into a unicode string, it complies to the JSON format, so that we can parse it and build a native object out of it.
The block header contains several valuable information for how to read the data and how to interpret it:
- byteStreamInfo: is an array that provide a set of information for each stream to be encoded. For each stream, we have an object like that:
{ // type is a string representing the name of the constructor of the stream // (ie. "Uint8Array", "Float32Array", "Object", etc. ) type: [String], // relevant only if this stream is a typed array. True if signed, false if unsigned signed: [Boolean], // relevant only if this stream is a typed array. Number of byte per number in the array. bytesPerElements: [Number], // length of the stream in byte byteLength: [Number], // relevant only if this stream is a typed array. Size of the typed array length: [Number] // Lengthen in number of byte of the stream when/if compressed. Remain null if uncompressed compressedByteLength: [Number], }
- originalBlockType: the name of the object constructor (directly from
constructor.name
) - metadataByteLength: the size in bytes of the serialized metadata buffer
- useMultipleDataStreams: Boolean. If
true
, the block's data is an Array of buffers/objects. Iffalse
, the block's data is a single buffer.
Notice: useMultipleDataStreams will be true even when the wrapping array has only a single component.
Once the block header contains the information we need, it is serialized so that it's tranformed into an ArrayBuffer
.
All the information contained in the block header are important to decode the data structure.
The metadata from the original object are serialized into an ArrayBuffer. Not much else to say about that, except that, for the sake of streaming, it's nevers compressed.
As told earlier, the data structure is encoded as a buffer, in Javascript, this means a ArrayBuffer
. This array buffer can come from a typed array (case 1), a concatenation of typed array (case 2), or an object serialization (case 3).
This buffer can be as is or compressed using zlib (JS port Pako). In the case of multi-array, each one is compressed independently.
Now for each block we have:
- an ArrayBuffer for the primer (uncompressed)
- an ArrayBuffer for the header (uncompressed)
- an ArrayBuffer for the metadata (uncompressed)
- an ArrayBuffer for the data (optionally compressed) Great! We can put these 4 ArrayBuffers into a single big one and have a nicely packed independant block!
We have already covered the PixBlocks and seen that in the end, they are independant ArrayBuffers. So let's recap what happens:
If you are still in the design phase and wondering what form should your _data
object, then case 2: mixed Array definitely provides the most flexibility:
- You can have a mixed Array of size one: only a single typed array or a single object.
- You can decide later if you want to add more
- You can use both large numerical arrays and complex objects