Bois Schema Specs

TO BE COMPLETED.

Bios Binary Format

Bois binary format is pretty straight forward. BOIS stands for Binary Object Indexed Serializer. Even tough the overall structure doesn't follow any specific rule, it still can be categorized as indexed sequential data format, hence the indexed word in name. Being indexed means that there is an index byte before every object. This index byte contains information about the the data that comes after it. It can even contain data by in itself. To know how continue reading.

The Specs

There are several type of index bytes that depending on the type of data that is going to be stored are used as the index byte.

Index Bytes

IB1 - Nullable: Generally used if the object/number is nullable.

index byte: [0_{null-flag}_0_0_0_0_0_0]
embedable integer: none

IB2 - Embed-able Nullable: Generally used if the object/number is nullable and is small enough to be embedded.

index byte: [{embedded-flag}_{null-flag}_0_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..63

IB3 - Embed-able Nullable Signed Number: Used for signed numbers which is nullable and is small enough to be embedded.

index byte: [{embedded-flag}_{null-flag}_{negative-flag}_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..31

IB4 - Embed-able Not-Null Signed Number: Used for signed numbers which can not be null and is small enough to be embedded.

index byte: [{embedded-flag}_{negative-flag}_0_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..63

IB5 - Embed-able Nullable Unsigned Number: Used for unsigned numbers which can be null and is small enough to be embedded.

index byte: [{embedded-flag}_{null-flag}_0_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..63

IB6 - Embed-able Not-Null Unsigned Number: Used for unsigned numbers which can not be null is be small enough to be embedded.

index byte: [{embedded-flag}_0_0_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..127

More on Index Bytes

If you have noticed, some of these index bytes have same structure. I've done this to simplify the process of writing the program. But we still need more info about these bytes which is the the amount of data that be embedded. Before that Lets see how to embed data in index byte.

Embedding In Index Byte

If the number that is going to be stored is small enough it can be stored in the index byte by merging the number and the flags. The flags should be preserved at all times. Any misuse of the embedded flag may lead to invalid data. First we have to know how much data can can be stored. For example Int32 is type of IB4 which can store any number in 0...63 range.

As an example of a Unsinged Integer imagine we want to store number 50. Since the datatype is uint and is not nullable it falls into IB6 category. Because 50 is smaller than IB6 embeddable range it can be stored in the index byte. Finally because the number is embeded we have to set the flag.
50 decimal = [00110010] byte
IB6 Embedded flag = [10000000]
Final byte = [10110010]

Now imagine that we want to save the same number 50 but this time the data type is a nullable signed integer int?. This type falls into IB3 category which the largest embedable number is 31 so that means we cannot embed 50 into index byte. This is how it is stored.
50 decimal = [00110010] byte
IB3 Not null not embeded signed number flag = [00000000]
Final bytes = [0000000][00110010]
In here the first byte is index flag which its flags are not enabled and the second byte is the number itself.

Same process should be done while reading data. As the first step we have to determine the datatable from the schema, then decide which index bytes category it belongs to and finally check the flags and read the data and seperated it from any flags.

Simple Data Types

This section descirbes the category and also the structure of simple data types supported by the serializer.

byte or unsigned byte

Category: None
Structure: None.

byte? or nullable unsigned byte

Category: IB5
Structure: None.

sbyte or signed byte

Category: None
Structure: None.

sbyte? or nullable signed byte

Category: IB3
Structure: None.

int or signed integer

Category: IB4
Structure: None.

int? or nullable signed integer

uint or unsigned integer

uint? or nullable unsigned integer

long or signed big integer

long? or nullable signed big integer

ulong or unsigned big integer

ulong? or nullable unsigned big integer

int16 or short or signed small integer

int16? or nullable short or nullable signed small integer

uint16 or short or unsigned small integer

uint16? or nullable short or nullable unsigned small integer

bool or boolean

Category: None
Structure: byte.

bool? or nullable boolean

Category: IB2
Structure: byte?.

char or character

Category: IB6
Structure: ushort.

char? or nullable character

Category: IB2
Structure: ushort?.

Primitive Data Types

This section describes the types that require a simple structure in addition to the category.

string

Structure: [data-length : uint?][string-data-encoded]

double or 64-bit floating-point

   Structure: [data-length : uint][double-variable-data]
   Data Format: Double value is converted to 8 bytes and only low values with actual data stored.
   TODO: explain.

double? or nullable 64-bit floating-point

Structure: [data-length : uint?][]

Every data has an index byte that describes the type or length of it.

There is 3 major type of index byte.

[NullIndicator-0-0-0-0-0-0-0]

[EmbedIndicator -0-0-0-0-0-0-0]

[NullIndicator- EmbedIndicator -0-0-0-0-0-0]

You will read their usage in the following papers.

Objects: [NullIndicator-0-0-0-0-0-0-0] Indicator byte of any object. Object either can be null or not.

Non-Nullable Primitive Types: Unsigned numbers: [EmbedIndicator-0-0-0-0-0-0-0] [optional data] 0..127 can be embedded

Signed numbers: [EmbedIndicator-SignIndicator-0-0-0-0-0-0] [optional data] 0..63 can be embedded

Nullable Primitive Types: Unsigned numbers: [NullIndicator-EmbedIndicator-0-0-0-0-0-0] [optional data] 0..127 can be embedded

Signed numbers: [NullIndicator-EmbedIndicator-SignIndicator-0-0-0-0-0] [optional data] 0..31 can be embedded

No use of ZigZag algorithm which was making the numbers big.

Non-Nullable Primitive Types: [EmbedIndicator-0-0-0-0-0-0-0] [optional data] Most primitive types can embd their data in the index-byte, but there is limitation of what type of data can be stored in the index-byte. Data in index-byte can only be an integer between 0…63 which is [EmbedIndicator-1-1-1-1-1-1-1] If the data is larger it should be stored in the after the index-byte using its own format. Data in the index-byte will indicate the size of the data after index-byte. Byte array: byte[] Byte arrays are used in many places as a standard data sorting algorithm in the storing/reading process. Byte arrays are stored in two formats:

Standard Byte Array: In this format the array is stored without any alteration. LossyArrayIndicator is always zero. [NullIndicator- EmbedIndicator -{LossyArrayIndicator=0}-0-0-0-0-0] [StoredBytesCount]

Lossless Compact Byte Array: This means any array provided by models are stored in compact format. Before storing the array it is scanned for 0 bytes from end of array until non-zero byte found. To determine whether if compact method should be used or not, the number of zero should superpass the 4 bytes which will be used to store the original size of the array. If this condition fails the standard method of storing arrays will be used. [NullIndicator- EmbedIndicator -LossyArrayIndicator-0-0-0-0-0] [ArrayOriginalSize] [StoredBytesCount]

Lossy Compact Byte Array: These type of arrays used internally only. This means any array provided by models are stored in compact format.

Before storing the array it is scanned for 0 bytes from end of array until non-zero byte found. These zeros will not be stored. The original size of the array is not stored. This method of storing array is mostly used in storing integer and floating point numbers. [NullIndicator- EmbedIndicator -0-0-0-0-0-0] [StoredBytesCount]

short, ushort, int, uint, long, ulong, byte, sbyte and … If the number is between 0 and 127: [EmbedIndicator=1 - the number] If the number is larger than 127 or smaller than 0: [EmbedIndicator=0 - the number of bytes holding data] [byte 0 of the zig-zag-number] [byte 1 of the zig-zag-number] … [byte n of the zig-zag-number] Examples: 0 [10000000] 100 [11100100] 127 [11111111] 128 [00000001][10000000] -512 [00000000] 1000000 [00000000]

float, double and decimal: The general concept is the somewhat same as integer numbers. If the floating point is zero and the number is between 0 and 127 the number can be stored in the index-byte otherwise it is stored in the following format.

0.0 [10000000] 100.0 [11100100]

For each type the number of bytes are differ:

Single 4 bytes

50.123 [10000100] [] [] [] []

Double 8 bytes array

5000.123 [10001000] [] [] [] []

Decimal 16 bytes array

C# Type Observation (Remove This)

Properties required while determining the object type: TypeBasicInfoCache-> -ShoulBeComputed -> Only used when deciding to serialize root instance not when properties/field of the object being serialized -UnderlyingType -> Holds the member-type or Array Item type or nullable item type -KnownType -> EnKnownType

So Two hashtable is required. TypeBasicInfoCache -> Contains only a few info about the type, mentioned above ComputedTypeInfoCache -> Containes Emitted type and the dlegates

C# Implementation: There are internal emitters who generate dynamic emit for complex types. CollectionGenerator, ArrayGenerator, EnumGenerator, ColorGenerator and .. Every complex type will have its Emit generated. Generated emit is basically is a code that uses Bois internal classes and method to read or write binary format.

This code is to be generated by Emit.

public static void Serialize(SampleClass obj, BinaryWriter writer) { // Checking if object itself is null or not if (obj == null) { writer.Write((byte)BoisInternalWriter.NullIndicator); return; } else { writer.Write((byte)BoisInternalWriter.NotNullIndicator); }

// writing List1 as List<string>
if (obj.List1 == null)
{
    writer.Write((byte)BoisInternalWriter.NullIndicator);
}
else
{
    BoisInternalWriter.WriteVarInt(obj.List1.Length);
    for (int i = 0; i < obj.List1.Length; i++)
    {
        BoisInternalWriter.WriteString(obj.List1[i]);
    }
}

// writing Name as string
if (obj.Name == null)
{
    BoisInternalWriter.WriteVarInt(writer, (int?)null);
}
else if (obj.Name.Length == 0)
{
    BoisInternalWriter.WriteVarInt(writer, (int?)0);
}
else
{
    var strBytes = Encoding.GetBytes(obj.Name);
    // Int32
    BoisInternalWriter.WriteVarInt(writer, (int?)strBytes.Length);
    writer.Write(strBytes);
}

}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly