-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
impl Format for i128/u128 #199
Comments
I wonder if we should use LEB128 encoding for some of the larger types... |
Yes... Definitely for u64 and IMO it's worth it for u32 as well. |
warning this section contains math We can compute how many bytes on average , say, a
If we assume a uniform distribution then each const PROB: f64 = 1. / (1<<16) as f64;
let avg = 128. * 1. * PROB + // `0..(1<<7)` encoded as 1 byte
16_256. * 2. * PROB + // `(1<<7)..(1<<14)` encoded as 2 bytes
49_152. * 3. * PROB; // `(1<<14)..(1<<16)` encoded as 3 bytes
dbg!(avg); Prints If you repeat this exercise for
Under this assumption LEB128 encoding doesn't pay off for (this section is free from math) "Will all Very likely no. Each variable in a single application is going to follow a different probability distribution, e.g. a "Why are we bothering with LEB128 encoding at all if the above math says that, an average, it's worse than using no encoding?" The length of a (string) slice ( "But why are we using LEB128 for This is more hand wavy but the hope is that the "And ... whistles innocently "So, should we use LEB128 for other integer types other than I don't know. Doing it in a blanket fashion, either never LEB128 encoding "But clearly Maybe? I would tend to agree but I don't have any data other than my own biased personal experience. |
What about applying some simple "zero-removing" compression to the encoded byte stream, instead of trying to compress each integer on its own? Something like this: https://capnproto.org/encoding.html#packing The max overhead of that is 12.5%. A u32 can take max "4.5 bytes" instead of 5 bytes. An upside is it'd also compress sparse signed integers could be encoded by flipping all non-sign bits if the sign bit is set, to turn 0xFF's into 0x00s, so small negative numbers compress better. (for example, -1 would be |
Would also be nice to have i128/u128 while at it.
Originally posted by @MathiasKoch in #160 (comment)
The text was updated successfully, but these errors were encountered: