Use LEB for br_table #738

titzer · 2016-08-01T19:35:38Z

(Rebasing onto binary_0xC branch--#735 was accidentally based on master)

A very significant portion of the AngryBots and BananaBread demo binaries is taken by br_table opcodes. In fact, in both of these binaries, the total space consumed by br_table is more than br and br_if combined. This PR proposes using LEBs in the encoding of br_table. LEBs are more space-efficient than uint32 for the very common case of br_table target entries being smaller than 4 bytes, and they are more consistent with the rest of the format.

A possible followup would be require the length of the entire br_table as an immediate as well, to allow for bytecode iterators to skip ahead without decoding the entire table.

mbebenita · 2016-08-01T20:28:10Z

Angry Bots

new.ab.wasm 11526505 
old.ab.wasm 11728278
a decrease of 1.72%

new.ab.wasm.gz 3761189 
old.ab.wasm.gz 3774923
a decrease of 0.36%

BananaBread

new.bb.wasm 2192211
old.bb.wasm 2298984
a decrease of 4.64%

new.bb.wasm.gz 759592
old.bb.wasm.gz 764356
a decrease of 0.62%

I tried this out in Binaryen, @titzer is this roughly what you're seeing?

titzer · 2016-08-01T20:29:56Z

Yes, those numbers are consistent with the measurements we made.

mbebenita · 2016-08-01T21:01:39Z

Out of curiosity, I tried a delta encoding w/ signed LEBs and the results are not great at all.

delta.ab.wasm 11526127
delta.ab.wasm.gz 3761463

delta.bb.wasm 2192245
delta.bb.wasm.gz 760323

ghost · 2016-08-02T00:14:25Z

I assume this is conceding the use case of being efficient for a literal interpreter of the wasm binary code? I saw another PR closed recently with a similar suggestion? Is V8 going to abandon their wasm interpreter, or will there be a decoder from the wasm into an internal code before interpreting?

Might a sparse representation option also help here? I guess with compression the difference might be small but it might still save more space for the uncompressed wasm binary.

titzer · 2016-08-02T00:19:58Z

The V8 WASM interpreter uses an internal lookup table for all branches, including br_table, so it is able to compute the target of any branch in O(logn). It's possible to do this in O(1) with a more sophisticated table, but so far the lookup time has not been a bottleneck.

rossberg · 2016-08-02T11:11:49Z

lgtm

* Clarify that wasm may be viewed as either an AST or a stack machine. (#686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed". * Make opcode 0x00 `unreachable`. (#684) Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode. All-zeros is one of the more common patterns of corrupted data. This change makes it more likely that code that is accidentally zeroed, in whole or in part, will be noticed when executed rather than silently running through a nop slide. Obviously, this doesn't matter when an opcode table is present, but if there is a default opcode table, it would presumably use the opcodes defined here. * BinaryEncoding.md changes implied by #682 * Fix thinko in import section * Rename definition_kind to external_kind for precision * Rename resizable_definition to resizable_limits * Add opcode delimiter to init_expr * Add Elem section to ToC and move it before Data section to reflect Table going before Memory * Add missing init_expr to global variables and undo the grouped representation of globals * Note that only immutable globals can be exported * Change the other 'mutability' flag to 'varuint1' * Give 'anyfunc' its own opcode * Add note about immutable global import requirement * Remove explicit 'default' flag; make memory/table default by default * Change (get|set)_global opcodes * Add end opcode to functions * Use section codes instead of section names (rebasing onto 0xC instead of master) This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder. It allows for user-defined sections that have string names to be encoded in the same manner as before. The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being. * Use LEB for br_table (#738) * Describe operand order of call_indirect (#758) * Remove arities from call/return (#748) * Limit varint sizes in Binary Encoding. (#764) * Global section (#771) global-variable was a broken anchor and the type of count was an undefined reference and inconsistent with all the rest of the sections. * Make name section a user-string section. * Update BinaryEncoding.md * Update BinaryEncoding.md * Use positive section code byte * Remove specification of name strings for unknown sections * Update BinaryEncoding.md * Remove repetition in definition of var(u)int types (#768) * Fix typo (#781) * Move the element section before the code section (#779) * Binary format identifier is out of date (#785) * Update BinaryEncoding.md to reflect the ml-proto encoding of the memory and table sections. (#800) * Add string back * Block signatures (#765) * Replace branch arities with block and if signatures. Moving arities to blocks has the nice property of giving implementations useful information up front, however some anticipated uses of this information would really want to know the types up front too. This patch proposes replacing block arities with function signature indices, which would provide full type information about a block up front. * Remove the arity operand from br_table too. * Remove mentions of "arguments". * Make string part of the payload * Remove references to post-order AST in BinaryEncoding.md (#801) * Simplify loop by removing its exit label. This removes loop's bottom label. * Move description of `return` to correct column (#804) * type correction and missing close quote (#805) * Remove more references to AST (#806) * Remove reference to AST in JS.md Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now. * Use "instruction" instead of "AST operator" * Update rationale for stack machine * Update Rationale.md * Update discussion of expression trees * Update MVP.md * Update Rationale.md * Update Rationale.md * Remove references to expressions * Update Rationale.md * Update Rationale.md * Address review comments * Address review comments * Address review comments * Delete h

Use LEB for br_table

89dd94e

titzer mentioned this pull request Aug 1, 2016

Use LEB for br_table #735

Closed

titzer added the binary format label Aug 1, 2016

titzer merged commit ba254fe into binary_0xc Aug 2, 2016

titzer deleted the binary_0xc_br_table_leb2 branch August 2, 2016 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use LEB for br_table #738

Use LEB for br_table #738

titzer commented Aug 1, 2016

mbebenita commented Aug 1, 2016

titzer commented Aug 1, 2016

mbebenita commented Aug 1, 2016

ghost commented Aug 2, 2016

titzer commented Aug 2, 2016

rossberg commented Aug 2, 2016

Use LEB for br_table #738

Use LEB for br_table #738

Conversation

titzer commented Aug 1, 2016

mbebenita commented Aug 1, 2016

Angry Bots

BananaBread

titzer commented Aug 1, 2016

mbebenita commented Aug 1, 2016

ghost commented Aug 2, 2016

titzer commented Aug 2, 2016

rossberg commented Aug 2, 2016