impl Format for i128/u128 #199

Dirbaio · 2020-10-20T14:44:05Z

Would also be nice to have i128/u128 while at it.

Originally posted by @MathiasKoch in #160 (comment)

jonas-schievink · 2020-10-20T14:45:08Z

I wonder if we should use LEB128 encoding for some of the larger types...

Dirbaio · 2020-10-20T14:47:55Z

Yes... Definitely for u64 and IMO it's worth it for u32 as well.

japaric · 2020-10-20T22:24:54Z

I wonder if we should use LEB128 encoding for some of the larger types...

warning this section contains math

We can compute how many bytes on average , say, a u16 will take when it's LEB128 if we assume, that in practice, the value will be uniformely distributed across the entire u16 range (and that's a pretty big if):

the range 0..(1<<7) (128 values) is LEB128 encoded as 1 byte
the range (1<<7)..(1<<14) (16,256 values) is LEB128 encoded as 2 bytes
the range (1<<14)..(1<<16) (49,152 values) is LEB128 encoded as 3 bytes

If we assume a uniform distribution then each u16 value has a probability of 1 / (1<<16) of being chosen. We can then compute the average encoding size (in bytes) as follows:

const PROB: f64 = 1. / (1<<16) as f64;

let avg = 128. * 1. * PROB + // `0..(1<<7)` encoded as 1 byte
    16_256. * 2. * PROB +    // `(1<<7)..(1<<14)` encoded as 2 bytes
    49_152. * 3. * PROB;     // `(1<<14)..(1<<16)` encoded as 3 bytes
dbg!(avg);

Prints 2.748046875. On average u16 is encoded as 2.748 bytes (again assuming a uniform distribution of values).
If you don't use LEB128 encoding then u16 is always encoded as 2 bytes.
Under this assumption, it's better to not use LEB128 encoding.

If you repeat this exercise for u32 and u64 you get (please someone check the numbers):

u32 is LEB128 encoded as 4.937 bytes on average
u64 is LEB128 encoded as 9.496 bytes on average

Under this assumption LEB128 encoding doesn't pay off for u32 or u64.

(this section is free from math)

"Will all u16, u32 and u64 variables, in practice, follow a uniform distribution?"

Very likely no. Each variable in a single application is going to follow a different probability distribution, e.g. a u32 counter variable that's usually reset to 0 vs a u32 hash value vs a u64 monotonic counter variable.

"Why are we bothering with LEB128 encoding at all if the above math says that, an average, it's worse than using no encoding?"

The length of a (string) slice (usize) on a 32-bit microcontroller is very likely going to be LEB128 encoded as 1, 2 or 3 bytes (i.e. be less than 1<<21) because the system likely doesn't have enough memory for a [u8; 1<<21] (that's 2MB) or larger arrays / slices. Because we know that, we have high confidence that LEB128 encoding the length will pay off (compared to the 4-byte bandwidth usage of not encoding the 32-bit usize).

"But why are we using LEB128 for usize?"

This is more hand wavy but the hope is that the usize value will more often than not come from slice.len(), str.len() or be some other kind of collection length. If that's the case then LEB128 encoding should pay off (on memory constrained systems).

"And ... isize?"

whistles innocently

"So, should we use LEB128 for other integer types other than usize?"

I don't know. Doing it in a blanket fashion, either never LEB128 encoding u32s or always LEB128 encoding u32s, is going to be good in some cases and bad in others. LEB128 encoding a u32 only saves bandwidth if the value is less than 1 << 21 (that's 2 million values); if the value is equal or greater than 1 << 28 (that's 4 billion (9 zeros) values) then LEB128 encoding actually wastes bandwidth. Ultimately, only the programmer knows on which range a particular variable in their application is going to fall and the answer is going to vary from variable to variable ...

"But clearly u32 counters are more commonly used / printed / logged than u32 hashes!"

Maybe? I would tend to agree but I don't have any data other than my own biased personal experience.

Dirbaio · 2020-10-21T11:35:11Z

What about applying some simple "zero-removing" compression to the encoded byte stream, instead of trying to compress each integer on its own?

Something like this: https://capnproto.org/encoding.html#packing

The max overhead of that is 12.5%. A u32 can take max "4.5 bytes" instead of 5 bytes. An upside is it'd also compress sparse [u8]s and such. A downside is it'd bloat things that typically don't have zero bytes, such as strings. Printing strings shouldn't be common in defmt though.

signed integers could be encoded by flipping all non-sign bits if the sign bit is set, to turn 0xFF's into 0x00s, so small negative numbers compress better. (for example, -1 would be 0x8000000, -2 would be 0x80000001, i32::MIN would be 0xFFFFFFFF)

jonas-schievink added the type: enhancement Enhancement or feature request label Oct 20, 2020

jonas-schievink added difficulty: easy Pretty easy to solve priority: low Low priority for the Knurling team status: needs PR Issue just needs a Pull Request implementing the changes labels Nov 10, 2020

Dirbaio mentioned this issue Nov 13, 2020

Performance - raw cycle count of an info! call #249

Closed

BriocheBerlin assigned BriocheBerlin and justahero Nov 17, 2020

jonas-schievink added this to the v0.1.1 milestone Nov 17, 2020

BriocheBerlin mentioned this issue Nov 27, 2020

Implement i128 and u128. #284

Merged

japaric modified the milestones: v0.1.2, v0.2.0 Nov 27, 2020

japaric added the breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version label Nov 27, 2020

BriocheBerlin closed this as completed in #284 Dec 3, 2020

Dirbaio mentioned this issue Apr 26, 2021

Code size #456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

impl Format for i128/u128 #199

impl Format for i128/u128 #199

Dirbaio commented Oct 20, 2020

jonas-schievink commented Oct 20, 2020

Dirbaio commented Oct 20, 2020

japaric commented Oct 20, 2020

Dirbaio commented Oct 21, 2020

impl Format for i128/u128 #199

impl Format for i128/u128 #199

Comments

Dirbaio commented Oct 20, 2020

jonas-schievink commented Oct 20, 2020

Dirbaio commented Oct 20, 2020

japaric commented Oct 20, 2020

Dirbaio commented Oct 21, 2020