You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure what the general attitude in the CG is about compression, but I remember that early on there was a strong focus on minimizing file size. With that in mind, standard compression algorithms like deflate don't like arbitrary numbers - they are unable, for instance, to predict that an increasing number sequence will continue to increase, so a workaround for this is to store numbers as differences, e.g. instead of [5, 8, 11, 21], store [5, 3, 3, 10]. This makes the numbers smaller, which tends to increase the compression ratio because it concentrates probability mass (e.g. in arithmetic/huffman encoding) toward smaller numbers.
The current format is quite natural:
the u32 byte offset of the hinted instruction from the first instruction of the function.
But since the list must be sorted, it would be straightforward to use differential encoding instead.
I haven't been very active in the Wasm community but I remember in the beginning there was an idea of having higher-level Wasm binary encodings, outside the core spec and implemented by 3rd parties, that could be used to optimize file size. Did that ever actually happen? If not, optimizing items in the core spec for compressibility starts to look more worthwhile.
The text was updated successfully, but these errors were encountered:
I think that differential encoding was mentioned at some point in the design, but to keep it more similar to other sections (like dwarf data) we kept it simpler for the moment.
I agree that it would make sense, even more so because the offsets are 32 bit integers in LEB format, which makes it even more advantageous to have small numbers, even without further compression on top.
On the other hand, I expect branch hints to be very sparse, so I am not sure if there will be significant gains in practice.
I will try to estimate it once I will have data on a real codebase using this.
I'm not sure what the general attitude in the CG is about compression, but I remember that early on there was a strong focus on minimizing file size. With that in mind, standard compression algorithms like deflate don't like arbitrary numbers - they are unable, for instance, to predict that an increasing number sequence will continue to increase, so a workaround for this is to store numbers as differences, e.g. instead of [5, 8, 11, 21], store [5, 3, 3, 10]. This makes the numbers smaller, which tends to increase the compression ratio because it concentrates probability mass (e.g. in arithmetic/huffman encoding) toward smaller numbers.
The current format is quite natural:
But since the list must be sorted, it would be straightforward to use differential encoding instead.
I haven't been very active in the Wasm community but I remember in the beginning there was an idea of having higher-level Wasm binary encodings, outside the core spec and implemented by 3rd parties, that could be used to optimize file size. Did that ever actually happen? If not, optimizing items in the core spec for compressibility starts to look more worthwhile.
The text was updated successfully, but these errors were encountered: