Skip to content

Latest commit

 

History

History
818 lines (630 loc) · 33.4 KB

BinaryEncoding.md

File metadata and controls

818 lines (630 loc) · 33.4 KB

Binary Encoding

This document describes the portable binary encoding of the WebAssembly modules.

The binary encoding is a dense representation of module information that enables small files, fast decoding, and reduced memory usage. See the rationale document for more detail.

🦄 = Planned future feature

The encoding is split into three layers:

  • Layer 0 is a simple binary encoding of the bytecode instructions and related data structures. The encoding is dense and trivial to interact with, making it suitable for scenarios like JIT, instrumentation tools, and debugging.
  • Layer 1 🦄 provides structural compression on top of layer 0, exploiting specific knowledge about the nature of the syntax tree and its nodes. The structural compression introduces more efficient encoding of values, rearranges values within the module, and prunes structurally identical tree nodes.
  • Layer 2 🦄 Layer 2 applies generic compression algorithms, like gzip and Brotli, that are already available in browsers and other tooling.

Most importantly, the layering approach allows development and standardization to occur incrementally. For example, Layer 1 and Layer 2 encoding techniques can be experimented with by application-level decompressing to the layer below. As compression techniques stabilize, they can be standardized and moved into native implementations.

See proposed layer 1 compression 🦄 for a proposal for layer 1 structural compression.

Data types

Numbers

uintN

An unsigned integer of N bits, represented in N/8 bytes in little endian order. N is either 8, 16, or 32.

varuintN

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

Note: Currently, the only sizes used are varuint1, varuint7, and varuint32, where the former two are used for compatibility with potential future extensions.

varintN

A Signed LEB128 variable-length integer, limited to N bits (i.e., the values [-2^(N-1), +2^(N-1)-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 or 0xFF bytes.

Note: Currently, the only sizes used are varint7, varint32 and varint64.

Instruction Opcodes

In the MVP, the opcodes of instructions are all encoded in a single byte since there are fewer than 256 opcodes. Future features like SIMD and atomics will bring the total count above 256 and so an extension scheme will be necessary, designating one or more single-byte values as prefixes for multi-byte opcodes.

Language Types

All types are distinguished by a negative varint7 values that is the first byte of their encoding (representing a type constructor):

Opcode Type constructor
-0x01 (i.e., the byte 0x7f) i32
-0x02 (i.e., the byte 0x7e) i64
-0x03 (i.e., the byte 0x7d) f32
-0x04 (i.e., the byte 0x7c) f64
-0x10 (i.e., the byte 0x70) anyfunc
-0x20 (i.e., the byte 0x60) func
-0x40 (i.e., the byte 0x40) pseudo type for representing an empty block_type

Some of these will be followed by additional fields, see below.

Note: Gaps are reserved for future 🦄 extensions. The use of a signed scheme is so that types can coexist in a single space with (positive) indices into the type section, which may be relevant for future extensions of the type system.

value_type

A varint7 indicating a value type. One of:

  • i32
  • i64
  • f32
  • f64

as encoded above.

block_type

A varint7 indicating a block signature. These types are encoded as:

  • either a value_type indicating a signature with a single result
  • or -0x40 (i.e., the byte 0x40) indicating a signature with 0 results.

elem_type

A varint7 indicating the types of elements in a table. In the MVP, only one type is available:

Note: In the future 🦄, other element types may be allowed.

func_type

The description of a function signature. Its type constructor is followed by an additional description:

Field Type Description
form varint7 the value for the func type constructor as defined above
param_count varuint32 the number of parameters to the function
param_types value_type* the parameter types of the function
return_count varuint1 the number of results from the function
return_type value_type? the result type of the function (if return_count is 1)

Note: In the future 🦄, return_count and return_type might be generalised to allow multiple values.

Other Types

global_type

The description of a global variable.

Field Type Description
content_type value_type type of the value
mutability varuint1 0 if immutable, 1 if mutable

table_type

The description of a table.

Field Type Description
element_type elem_type the type of elements
limits resizable_limits see below

memory_type

The description of a memory.

Field Type Description
limits resizable_limits see below

external_kind

A single-byte unsigned integer indicating the kind of definition being imported or defined:

resizable_limits

A packed tuple that describes the limits of a table or memory:

Field Type Description
flags varuint1 1 if the maximum field is present, 0 otherwise
initial varuint32 initial length (in units of table elements or wasm pages)
maximum varuint32? only present if specified by flags

Note: In the future 🦄, the "flags" field may be changed to varuint32, e.g., to include a flag for sharing between threads.

init_expr

The encoding of an initializer expression is the normal encoding of the expression followed by the end opcode as a delimiter.

Note that get_global in an initializer expression can only refer to immutable imported globals and all uses of init_expr can only appear after the Imports section.

Module structure

The following documents the current prototype format. This format is based on and supersedes the v8-native prototype format, originally in a public design doc.

High-level structure

The module starts with a preamble of two fields:

Field Type Description
magic number uint32 Magic number 0x6d736100 (i.e., '\0asm')
version uint32 Version number, 0x1

The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte section code that encodes either a known section or a custom section. The section length and payload data then follow. Known sections have non-zero ids, while custom sections have a 0 id followed by an identifying string as part of the payload.

Field Type Description
id varuint7 section code
payload_len varuint32 size of this section in bytes
name_len varuint32 ? length of name in bytes, present if id == 0
name bytes ? section name: valid UTF-8 byte sequence, present if id == 0
payload_data bytes content of this section, of length payload_len - sizeof(name) - sizeof(name_len)

Each known section is optional and may appear at most once. Custom sections all have the same id (0), and can be named non-uniquely (all bytes composing their names may be identical).

Custom sections are intended to be used for debugging information, future evolution, or third party extensions. For MVP, we use a specific custom section (the Name Section) for debugging information.

If a WebAssembly implementation interprets the payload of any custom section during module validation or compilation, errors in that payload must not invalidate the module.

Known sections from the list below may not appear out of order, while custom sections may be interspersed before, between, as well as after any of the elements of the list, in any order. Certain custom sections may have their own ordering and cardinality requirements. For example, the Name section is expected to appear at most once, immediately after the Data section. Violation of such requirements may at most cause an implementation to ignore the section, while not invalidating the module.

The content of each section is encoded in its payload_data.

Section Name Code Description
Type 1 Function signature declarations
Import 2 Import declarations
Function 3 Function declarations
Table 4 Indirect function table and other tables
Memory 5 Memory attributes
Global 6 Global declarations
Export 7 Exports
Start 8 Start function declaration
Element 9 Elements section
Code 10 Function bodies (code)
Data 11 Data segments

The end of the last present section must coincide with the last byte of the module. The shortest valid module is 8 bytes (magic number, version, followed by zero sections).

Type section

The type section declares all function signatures that will be used in the module.

Field Type Description
count varuint32 count of type entries to follow
entries func_type* repeated type entries as described above

Note: In the future 🦄, this section may contain other forms of type entries as well, which can be distinguished by the form field of the type encoding.

Import section

The import section declares all imports that will be used in the module.

Field Type Description
count varuint32 count of import entries to follow
entries import_entry* repeated import entries as described below

Import entry

Field Type Description
module_len varuint32 length of module_str in bytes
module_str bytes module name: valid UTF-8 byte sequence
field_len varuint32 length of field_str in bytes
field_str bytes field name: valid UTF-8 byte sequence
kind external_kind the kind of definition being imported

Followed by, if the kind is Function:

Field Type Description
type varuint32 type index of the function signature

or, if the kind is Table:

Field Type Description
type table_type type of the imported table

or, if the kind is Memory:

Field Type Description
type memory_type type of the imported memory

or, if the kind is Global:

Field Type Description
type global_type type of the imported global

Note that, in the MVP, only immutable global variables can be imported.

Function section

The function section declares the signatures of all functions in the module (their definitions appear in the code section).

Field Type Description
count varuint32 count of signature indices to follow
types varuint32* sequence of indices into the type section

Table section

The encoding of a Table section:

Field Type Description
count varuint32 indicating the number of tables defined by the module
entries table_type* repeated table_type entries as described above

In the MVP, the number of tables must be no more than 1.

Memory section

ID: memory

The encoding of a Memory section:

Field Type Description
count varuint32 indicating the number of memories defined by the module
entries memory_type* repeated memory_type entries as described above

Note that the initial/maximum fields are specified in units of WebAssembly pages.

In the MVP, the number of memories must be no more than 1.

Global section

The encoding of the Global section:

Field Type Description
count varuint32 count of global variable entries
globals global_variable* global variables, as described below

Global Entry

Each global_variable declares a single global variable of a given type, mutability and with the given initializer.

Field Type Description
type global_type type of the variables
init init_expr the initial value of the global

Note that, in the MVP, only immutable global variables can be exported.

Export section

The encoding of the Export section:

Field Type Description
count varuint32 count of export entries to follow
entries export_entry* repeated export entries as described below

Export entry

Field Type Description
field_len varuint32 length of field_str in bytes
field_str bytes field name: valid UTF-8 byte sequence
kind external_kind the kind of definition being exported
index varuint32 the index into the corresponding index space

For example, if the "kind" is Function, then "index" is a function index. Note that, in the MVP, the only valid index value for a memory or table export is 0.

Start section

The start section declares the start function.

Field Type Description
index varuint32 start function index

Element section

The encoding of the Elements section:

Field Type Description
count varuint32 count of element segments to follow
entries elem_segment* repeated element segments as described below

a elem_segment is:

Field Type Description
index varuint32 the table index (0 in the MVP)
offset init_expr an i32 initializer expression that computes the offset at which to place the elements
num_elem varuint32 number of elements to follow
elems varuint32* sequence of function indices

Code section

ID: code

The code section contains a body for every function in the module. The count of function declared in the function section and function bodies defined in this section must be the same and the ith declaration corresponds to the ith function body.

Field Type Description
count varuint32 count of function bodies to follow
bodies function_body* sequence of Function Bodies

Data section

The data section declares the initialized data that is loaded into the linear memory.

Field Type Description
count varuint32 count of data segments to follow
entries data_segment* repeated data segments as described below

a data_segment is:

Field Type Description
index varuint32 the linear memory index (0 in the MVP)
offset init_expr an i32 initializer expression that computes the offset at which to place the data
size varuint32 size of data (in bytes)
data bytes sequence of size bytes

Name section

Custom section name field: "name"

The name section is a custom section. It is therefore encoded with id 0 followed by the name string "name". Like all custom sections, this section being malformed does not cause the validation of the module to fail. It is up to the implementation how it handles a malformed or partially malformed name section. The WebAssembly implementation is also free to choose to read and process this section lazily, after the module has been instantiated, should debugging be required.

The name section may appear only once, and only after the Data section. The expectation is that, when a binary WebAssembly module is viewed in a browser or other development environment, the data in this section will be used as the names of functions and locals in the text format.

The name section contains a sequence of name subsections:

Field Type Description
name_type varuint7 code identifying type of name contained in this subsection
name_payload_len varuint32 size of this subsection in bytes
name_payload_data bytes content of this section, of length name_payload_len

Since name subsections have a given length, unknown or unwanted subsections can be skipped over by an engine. The current list of valid name_type codes are:

Name Type Code Description
Module 0 Assigns a name to the module
Function 1 Assigns names to functions
Local 2 Assigns names to locals in functions

When present, subsections must appear in this order and at most once. The end of the last subsection must coincide with the last byte of the name section to be a well-formed name section.

Module name

The module name subsection assigns a name to the module itself. It simply consists of a single string:

Field Type Description
name_len varuint32 length of name_str in bytes
name_str bytes UTF-8 encoding of the name

Name Map

In the following subsections, a name_map is encoded as:

Field Type Description
count varuint32 number of naming in names
names naming* sequence of naming sorted by index

where a naming is encoded as:

Field Type Description
index varuint32 the index which is being named
name_len varuint32 length of name_str in bytes
name_str bytes UTF-8 encoding of the name

Function names

The function names subsection is a name_map which assigns names to a subset of the function index space (both imports and module-defined).

Each function may be named at most once. Naming a function more than once results in the section being malformed.

However, names need not be unique. The same name may be given for multiple functions. This is common for C++ programs where the multiple compilation units that comprise a binary can contain local functions with the same name.

Local names

The local names subsection assigns name_maps to a subset of functions in the function index space (both imports and module-defined). The name_map for a given function assigns names to a subset of local variable indices.

Field Type Description
count varuint32 count of local_names in funcs
funcs local_names* sequence of local_names sorted by index

where a local_name is encoded as:

Field Type Description
index varuint32 the index of the function whose locals are being named
local_map name_map assignment of names to local indices

Function Bodies

Function bodies consist of a sequence of local variable declarations followed by bytecode instructions. Instructions are encoded as an opcode followed by zero or more immediates as defined by the tables below. Each function body must end with the end opcode.

Field Type Description
body_size varuint32 size of function body to follow, in bytes
local_count varuint32 number of local entries
locals local_entry* local variables
code byte* bytecode of the function
end byte 0x0b, indicating the end of the body

Local Entry

Each local entry declares a number of local variables of a given type. It is legal to have several entries with the same type.

Field Type Description
count varuint32 number of local variables of the following type
type value_type type of the variables

Control flow operators (described here)

Name Opcode Immediates Description
unreachable 0x00 trap immediately
nop 0x01 no operation
block 0x02 sig : block_type begin a sequence of expressions, yielding 0 or 1 values
loop 0x03 sig : block_type begin a block which can also form control flow loops
if 0x04 sig : block_type begin if expression
else 0x05 begin else expression of if
end 0x0b end a block, loop, or if
br 0x0c relative_depth : varuint32 break that targets an outer nested block
br_if 0x0d relative_depth : varuint32 conditional break that targets an outer nested block
br_table 0x0e see below branch table control flow construct
return 0x0f return zero or one value from this function

The sig fields of block and if operators specify function signatures which describe their use of the operand stack.

The br_table operator has an immediate operand which is encoded as follows:

Field Type Description
target_count varuint32 number of entries in the target_table
target_table varuint32* target entries that indicate an outer block or loop to which to break
default_target varuint32 an outer block or loop to which to break in the default case

The br_table operator implements an indirect branch. It accepts an optional value argument (like other branches) and an additional i32 expression as input, and branches to the block or loop at the given offset within the target_table. If the input value is out of range, br_table branches to the default target.

Note: Gaps in the opcode space, here and elsewhere, are reserved for future 🦄 extensions.

Call operators (described here)

Name Opcode Immediates Description
call 0x10 function_index : varuint32 call a function by its index
call_indirect 0x11 type_index : varuint32, reserved : varuint1 call a function indirect with an expected signature

The call_indirect operator takes a list of function arguments and as the last operand the index into the table. Its reserved immediate is for future 🦄 use and must be 0 in the MVP.

Parametric operators (described here)

Name Opcode Immediates Description
drop 0x1a ignore value
select 0x1b select one of two values based on condition

Variable access (described here)

Name Opcode Immediates Description
get_local 0x20 local_index : varuint32 read a local variable or parameter
set_local 0x21 local_index : varuint32 write a local variable or parameter
tee_local 0x22 local_index : varuint32 write a local variable or parameter and return the same value
get_global 0x23 global_index : varuint32 read a global variable
set_global 0x24 global_index : varuint32 write a global variable

Memory-related operators (described here)

Name Opcode Immediate Description
i32.load 0x28 memory_immediate load from memory
i64.load 0x29 memory_immediate load from memory
f32.load 0x2a memory_immediate load from memory
f64.load 0x2b memory_immediate load from memory
i32.load8_s 0x2c memory_immediate load from memory
i32.load8_u 0x2d memory_immediate load from memory
i32.load16_s 0x2e memory_immediate load from memory
i32.load16_u 0x2f memory_immediate load from memory
i64.load8_s 0x30 memory_immediate load from memory
i64.load8_u 0x31 memory_immediate load from memory
i64.load16_s 0x32 memory_immediate load from memory
i64.load16_u 0x33 memory_immediate load from memory
i64.load32_s 0x34 memory_immediate load from memory
i64.load32_u 0x35 memory_immediate load from memory
i32.store 0x36 memory_immediate store to memory
i64.store 0x37 memory_immediate store to memory
f32.store 0x38 memory_immediate store to memory
f64.store 0x39 memory_immediate store to memory
i32.store8 0x3a memory_immediate store to memory
i32.store16 0x3b memory_immediate store to memory
i64.store8 0x3c memory_immediate store to memory
i64.store16 0x3d memory_immediate store to memory
i64.store32 0x3e memory_immediate store to memory
current_memory 0x3f reserved : varuint1 query the size of memory
grow_memory 0x40 reserved : varuint1 grow the size of memory

The memory_immediate type is encoded as follows:

Name Type Description
flags varuint32 a bitfield which currently contains the alignment in the least significant bits, encoded as log2(alignment)
offset varuint32 the value of the offset

As implied by the log2(alignment) encoding, the alignment must be a power of 2. As an additional validation criteria, the alignment must be less or equal to natural alignment. The bits after the log(memory-access-size) least-significant bits must be set to 0. These bits are reserved for future 🦄 use (e.g., for shared memory ordering requirements).

The reserved immediate to the current_memory and grow_memory operators is for future 🦄 use and must be 0 in the MVP.

Constants (described here)

Name Opcode Immediates Description
i32.const 0x41 value : varint32 a constant value interpreted as i32
i64.const 0x42 value : varint64 a constant value interpreted as i64
f32.const 0x43 value : uint32 a constant value interpreted as f32
f64.const 0x44 value : uint64 a constant value interpreted as f64

Comparison operators (described here)

Name Opcode Immediate Description
i32.eqz 0x45
i32.eq 0x46
i32.ne 0x47
i32.lt_s 0x48
i32.lt_u 0x49
i32.gt_s 0x4a
i32.gt_u 0x4b
i32.le_s 0x4c
i32.le_u 0x4d
i32.ge_s 0x4e
i32.ge_u 0x4f
i64.eqz 0x50
i64.eq 0x51
i64.ne 0x52
i64.lt_s 0x53
i64.lt_u 0x54
i64.gt_s 0x55
i64.gt_u 0x56
i64.le_s 0x57
i64.le_u 0x58
i64.ge_s 0x59
i64.ge_u 0x5a
f32.eq 0x5b
f32.ne 0x5c
f32.lt 0x5d
f32.gt 0x5e
f32.le 0x5f
f32.ge 0x60
f64.eq 0x61
f64.ne 0x62
f64.lt 0x63
f64.gt 0x64
f64.le 0x65
f64.ge 0x66

Numeric operators (described here)

Name Opcode Immediate Description
i32.clz 0x67
i32.ctz 0x68
i32.popcnt 0x69
i32.add 0x6a
i32.sub 0x6b
i32.mul 0x6c
i32.div_s 0x6d
i32.div_u 0x6e
i32.rem_s 0x6f
i32.rem_u 0x70
i32.and 0x71
i32.or 0x72
i32.xor 0x73
i32.shl 0x74
i32.shr_s 0x75
i32.shr_u 0x76
i32.rotl 0x77
i32.rotr 0x78
i64.clz 0x79
i64.ctz 0x7a
i64.popcnt 0x7b
i64.add 0x7c
i64.sub 0x7d
i64.mul 0x7e
i64.div_s 0x7f
i64.div_u 0x80
i64.rem_s 0x81
i64.rem_u 0x82
i64.and 0x83
i64.or 0x84
i64.xor 0x85
i64.shl 0x86
i64.shr_s 0x87
i64.shr_u 0x88
i64.rotl 0x89
i64.rotr 0x8a
f32.abs 0x8b
f32.neg 0x8c
f32.ceil 0x8d
f32.floor 0x8e
f32.trunc 0x8f
f32.nearest 0x90
f32.sqrt 0x91
f32.add 0x92
f32.sub 0x93
f32.mul 0x94
f32.div 0x95
f32.min 0x96
f32.max 0x97
f32.copysign 0x98
f64.abs 0x99
f64.neg 0x9a
f64.ceil 0x9b
f64.floor 0x9c
f64.trunc 0x9d
f64.nearest 0x9e
f64.sqrt 0x9f
f64.add 0xa0
f64.sub 0xa1
f64.mul 0xa2
f64.div 0xa3
f64.min 0xa4
f64.max 0xa5
f64.copysign 0xa6

Conversions (described here)

Name Opcode Immediate Description
i32.wrap/i64 0xa7
i32.trunc_s/f32 0xa8
i32.trunc_u/f32 0xa9
i32.trunc_s/f64 0xaa
i32.trunc_u/f64 0xab
i64.extend_s/i32 0xac
i64.extend_u/i32 0xad
i64.trunc_s/f32 0xae
i64.trunc_u/f32 0xaf
i64.trunc_s/f64 0xb0
i64.trunc_u/f64 0xb1
f32.convert_s/i32 0xb2
f32.convert_u/i32 0xb3
f32.convert_s/i64 0xb4
f32.convert_u/i64 0xb5
f32.demote/f64 0xb6
f64.convert_s/i32 0xb7
f64.convert_u/i32 0xb8
f64.convert_s/i64 0xb9
f64.convert_u/i64 0xba
f64.promote/f32 0xbb

Reinterpretations (described here)

Name Opcode Immediate Description
i32.reinterpret/f32 0xbc
i64.reinterpret/f64 0xbd
f32.reinterpret/i32 0xbe
f64.reinterpret/i64 0xbf