Should differential encoding be used? #15

qwertie · 2021-11-09T18:28:58Z

I'm not sure what the general attitude in the CG is about compression, but I remember that early on there was a strong focus on minimizing file size. With that in mind, standard compression algorithms like deflate don't like arbitrary numbers - they are unable, for instance, to predict that an increasing number sequence will continue to increase, so a workaround for this is to store numbers as differences, e.g. instead of [5, 8, 11, 21], store [5, 3, 3, 10]. This makes the numbers smaller, which tends to increase the compression ratio because it concentrates probability mass (e.g. in arithmetic/huffman encoding) toward smaller numbers.

The current format is quite natural:

the u32 byte offset of the hinted instruction from the first instruction of the function.

But since the list must be sorted, it would be straightforward to use differential encoding instead.

I haven't been very active in the Wasm community but I remember in the beginning there was an idea of having higher-level Wasm binary encodings, outside the core spec and implemented by 3rd parties, that could be used to optimize file size. Did that ever actually happen? If not, optimizing items in the core spec for compressibility starts to look more worthwhile.

yuri91 · 2021-11-18T09:21:01Z

I think that differential encoding was mentioned at some point in the design, but to keep it more similar to other sections (like dwarf data) we kept it simpler for the moment.

I agree that it would make sense, even more so because the offsets are 32 bit integers in LEB format, which makes it even more advantageous to have small numbers, even without further compression on top.

On the other hand, I expect branch hints to be very sparse, so I am not sure if there will be significant gains in practice.
I will try to estimate it once I will have data on a real codebase using this.

NOTE: The Branch hinting section follows the Code Annotation format: https://github.com/WebAssembly/tool-conventions/blob/main/CodeAnnotations.md , so this change would need to happen there as well

yuri91 mentioned this issue Nov 10, 2021

Add a first draft of the code annotations specification. WebAssembly/tool-conventions#173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should differential encoding be used? #15

Should differential encoding be used? #15

qwertie commented Nov 9, 2021

yuri91 commented Nov 18, 2021

Should differential encoding be used? #15

Should differential encoding be used? #15

Comments

qwertie commented Nov 9, 2021

yuri91 commented Nov 18, 2021