Skip to content

Commit

Permalink
char type
Browse files Browse the repository at this point in the history
I think it'd be good to have an explicitly seperated char type (totally didn't also overlook it aha)
  • Loading branch information
meadowsys committed Aug 26, 2024
1 parent 9c0adc5 commit f66fa76
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions src/serialiser_binary_2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ All values that are sensitive to endianness, will use the little endian byte ord
- `0xa0` to `0xaf` - array, length 1 to 16
- `0xb0` to `0xb7` - record, length 1 to 8
- `0xb8` to `0xbf` - map, length 1 to 8
- `0xc0` to `0xfe` - reference to interned value, ref 0 to 62
- `0xc0` - char
- `0xc1` to `0xfe` - reference to interned value, ref 0 to 61
- `0xff` - reference to interned value

## `none`/`null`/`nil`/`None`/etc
Expand Down Expand Up @@ -84,6 +85,14 @@ If the string is either 0 bytes long (ie. empty string), or is length 33 bytes o

After writing the marker byte(s), write the string in verbatim (ie. during deserialisation you should be able to zero-copy deserialise it just by doing something like `std::str::from_str(&input[pos..pos + len]`, where `len` is the length of the string, and `pos` is the current position in deserialisation)).

## chars

A char is an integer value representing a unicode codepoint.

A codepoint is any value between `0` and `0x10ffff`, excluding the surrogate ranges (`0xd800` to `0xdfff`).

Since the range of values is small enough to fit in a u24, we do just that. First encode the char marker (`0xc0`), then the codepoint value in the following 3 bytes, for a total of 4 bytes used.

## arrays

Arrays are like a list of serialised values. There are specialised array types available, but this one is the most "generic" array type, able to encode lists of anything that can be encoded at all.
Expand Down Expand Up @@ -148,7 +157,7 @@ Each interned entry written should have the same index as its reference value. T

References should be encoded in place of another value.

If the reference value is between 0 and 63, encode it using a marker in `0xc0` to `0xfe` (ie. ref 0 is `0xc0`, and ref 62 is `0xfe`).
If the reference value is between 0 and 63, encode it using a marker in `0xc1` to `0xfe` (ie. ref 0 is `0xc1`, and ref 61 is `0xfe`).

If the reference value is 63 or greater, first write the ref marker byte `0xff`, followed by the reference value as an unsigned variable length integer.

Expand Down

0 comments on commit f66fa76

Please sign in to comment.