diff --git a/proposals/simd/Overview.md b/proposals/simd/Overview.md
index 1333ed77b..2b9233bd0 100644
--- a/proposals/simd/Overview.md
+++ b/proposals/simd/Overview.md
@@ -1 +1,9 @@
-TODO
+# SIMD support for WebAssembly
+
+This proposal describes how 128-bit packed SIMD types and operations can be
+added to WebAssembly. It is based on [previous work on SIMD.js in the Ecma TC39
+ECMAScript committee](https://github.com/tc39/ecmascript_simd) and the
+[portable SIMD specification](https://github.com/stoklund/portable-simd) that
+resulted.
+
+The [proposed specification](SIMD.md) has the details.
diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
new file mode 100644
index 000000000..d0b4d2a17
--- /dev/null
+++ b/proposals/simd/SIMD.md
@@ -0,0 +1,839 @@
+# WebAssembly 128-bit packed SIMD Extension
+
+This specification describes a 128-bit packed *Single Instruction Multiple
+Data* (SIMD) extension to WebAssembly that can be implemented efficiently on
+current popular instruction set architectures.
+
+# Types
+
+WebAssembly is extended with five new value types and a number of new kinds of
+immediate operands used by the SIMD instructions.
+
+## SIMD value types
+
+The `v128` type has a concrete mapping to a 128-bit representation. The boolean
+types do not have a bit-pattern representation.
+
+* `v128`: A 128-bit SIMD vector. Bits are numbered 0–127.
+* `b8x16`: A vector of 16 `boolean` lanes numbered 0–15.
+* `b16x8`: A vector of 8 `boolean` lanes numbered 0–7.
+* `b32x4`: A vector of 4 `boolean` lanes numbered 0–3.
+* `b64x2`: A vector of 2 `boolean` lanes numbered 0–1.
+
+The `v128` type corresponds to a vector register in a typical SIMD ISA. The
+interpretation of the 128 bits in the vector register is provided by the
+individual instructions. When a `v128` value is represented as 16 bytes, bits
+0-7 go in the first byte with bit 0 as the LSB, bits 8-15 go in the second
+byte, etc.
+
+The abstract boolean vector types can be mapped to vector registers or predicate
+registers by an implementation. They have a property `S.Lanes` which is used by
+the pseudo-code below:
+
+|    S    | S.Lanes |
+|---------|--------:|
+| `b8x16` |      16 |
+| `b16x8` |       8 |
+| `b32x4` |       4 |
+| `b64x2` |       2 |
+
+## Immediate operands
+
+Some of the new SIMD instructions defined here have immediate operands that are
+encoded as individual bytes in the binary encoding. Many have a limited valid
+range, and it is a validation error if the immediate operands are out of range.
+
+* `ImmBits2`: A byte with values in the range 0-3 used to initialize a `b64x2`.
+* `ImmBits4`: A byte with values in the range 0-15 used to initialize a `b32x4`.
+* `ImmByte`: A single unconstrained byte (0-255).
+* `LaneIdx2`: A byte with values in the range 0–1 identifying a lane.
+* `LaneIdx4`: A byte with values in the range 0–3 identifying a lane.
+* `LaneIdx8`: A byte with values in the range 0–7 identifying a lane.
+* `LaneIdx16`: A byte with values in the range 0–15 identifying a lane.
+* `LaneIdx32`: A byte with values in the range 0–31 identifying a lane.
+
+## Interpreting SIMD value types
+
+The single `v128` SIMD type can represent packed data in multiple ways.
+Instructions specify how the bits should be interpreted through a hierarchy of
+*interpretations*.
+
+The boolean vector types only have the one interpretation given by their type.
+
+### Lane division interpretation
+
+The first level of interpretations of the `v128` type imposes a lane structure on
+the bits:
+
+* `v8x16 : v128`: 8-bit lanes numbered 0–15. Lane n corresponds to bits 8n – 8n+7.
+* `v16x8 : v128`: 16-bit lanes numbered 0–7. Lane n corresponds to bits 16n – 16n+15.
+* `v32x4 : v128`: 32-bit lanes numbered 0–3. Lane n corresponds to bits 32n – 32n+31.
+* `v64x2 : v128`: 64-bit lanes numbered 0–1. Lane n corresponds to bits 64n – 64n+63.
+
+The lane dividing interpretations don't say anything about the semantics of the
+bits in each lane. The interpretations have *properties* used by the semantic
+specification pseudo-code below:
+
+|    S    | S.LaneBits | S.Lanes | S.BoolType |
+|---------|-----------:|--------:|:----------:|
+| `v8x16` |          8 |      16 | `b8x16`    |
+| `v16x8` |         16 |       8 | `b16x8`    |
+| `v32x4` |         32 |       4 | `b32x4`    |
+| `v64x2` |         64 |       2 | `b64x2`    |
+
+Since WebAssembly is little-endian, the least significant bit in each lane is
+the bit with the lowest number.
+
+### Modulo integer interpretations
+
+The bits in a lane can be interpreted as integers with modulo arithmetic
+semantics. Many arithmetic operations can be defined on these types which don't
+impose a signed or unsigned integer interpretation.
+
+* `i8x16 : v8x16`: Each lane is an `i8`.
+* `i16x8 : v16x8`: Each lane is an `i16`.
+* `i32x4 : v32x4`: Each lane is an `i32`.
+* `i64x2 : v64x2`: Each lane is an `i64`.
+
+Additional properties:
+
+|    S    |  S.Smin | S.Smax | S.Umax |
+|---------|--------:|-------:|-------:|
+| `i8x16` |    -2^7 |  2^7-1 |  2^8-1 |
+| `i16x8` |   -2^15 | 2^15-1 | 2^16-1 |
+| `i32x4` |   -2^31 | 2^31-1 | 2^32-1 |
+| `i64x2` |   -2^63 | 2^63-1 | 2^64-1 |
+
+Some operations interpret each lane specifically as a signed or unsigned
+integer. These operations have `_s` and `_u` suffixes as is the convention is
+WebAssembly.
+
+### Floating-point interpretations
+
+Each lane is interpreted as an IEEE floating-point number.
+
+* `f32x4 : v32x4`: Each lane is an `f32`.
+* `f64x2 : v64x2`: Each lane is an `f64`.
+
+The floating-point operations in this specification aim to be compatible with
+WebAssembly's scalar floating-point operations. In particular, the rules about
+NaN propagation and default NaN values are the same, and all operations use the
+default *roundTiesToEven* rounding mode.
+
+An implementation is allowed to flush subnormals in arithmetic floating-point
+operations. This means that any subnormal operand is treated as 0, and any
+subnormal result is rounded to 0. Note that this differs from WebAssembly
+scalar floating-point semantics which require correct subnormal handling.
+
+# Operations
+
+The SIMD operations described in this sections are generally named
+`S.Op`, where `S` is either a SIMD type or one of the interpretations
+of a SIMD type.
+
+Many operations are simply the lane-wise application of a scalar operation:
+
+```python
+def S.lanewise_unary(func, a):
+    result = S.New()
+    for i in range(S.Lanes):
+        result[i] = func(a[i])
+    return result
+
+def S.lanewise_binary(func, a, b):
+    result = S.New()
+    for i in range(S.Lanes):
+        result[i] = func(a[i], b[i])
+    return result
+```
+
+Comparison operators produce a boolean vector:
+
+```python
+def S.lanewise_comparison(func, a, b):
+    result = S.BoolType.New()
+    for i in range(S.Lanes):
+        result[i] = func(a[i], b[i])
+    return result
+```
+
+## Constructing SIMD values
+
+### Constants
+* `v128.const(imm: ImmByte[16]) -> v128`
+* `b8x16.const(imm: ImmByte[2]) -> b8x16`
+* `b16x8.const(imm: ImmByte) -> b16x8`
+* `b32x4.const(imm: ImmBits4) -> b32x4`
+* `b64x2.const(imm: ImmBits2) -> b64x2`
+
+Materialize a constant SIMD value from the immediate operands. The `v128.const`
+instruction is encoded with 16 immediate bytes which provide the bits of the
+vector directly. The boolean constants are encoded with one bit per lane such
+that lane 0 is the LSB of the first immediate byte.
+
+### Build vector from individual lanes
+* `b8x16.build(x: i32[16]) -> b8x16`
+* `b16x8.build(x: i32[8]) -> b16x8`
+* `b32x4.build(x: i32[4]) -> b32x4`
+* `b64x2.build(x: i32[2]) -> b64x2`
+* `i8x16.build(x: i32[16]) -> v128`
+* `i16x8.build(x: i32[8]) -> v128`
+* `i32x4.build(x: i32[4]) -> v128`
+* `i64x2.build(x: i64[2]) -> v128`
+* `f32x4.build(x: f32[4]) -> v128`
+* `f64x2.build(x: f64[2]) -> v128`
+
+Construct a vector from an array of individual lane values.
+
+```python
+def S.build(x):
+    result = S.New()
+    for i in range(S.Lanes):
+        result[i] = x[i]
+    return result
+```
+
+The `i32[16]` array notation is a shorthand for a sequence of identically typed
+arguments. So `b8x16.build` takes 16 `i32` arguments where a non-zero value is
+interpreted as true.
+
+### Create vector with identical lanes
+* `b8x16.splat(x: i32) -> b8x16`
+* `b16x8.splat(x: i32) -> b16x8`
+* `b32x4.splat(x: i32) -> b32x4`
+* `b64x2.splat(x: i32) -> b64x2`
+* `i8x16.splat(x: i32) -> v128`
+* `i16x8.splat(x: i32) -> v128`
+* `i32x4.splat(x: i32) -> v128`
+* `i64x2.splat(x: i64) -> v128`
+* `f32x4.splat(x: f32) -> v128`
+* `f64x2.splat(x: f64) -> v128`
+
+Construct a vector with `x` replicated to all lanes:
+
+```python
+def S.splat(x):
+    result = S.New()
+    for i in range(S.Lanes):
+        result[i] = x
+    return result
+```
+
+The boolean vector splats will create a vector with all false lanes if `x` is
+zero, all true lanes otherwise. The `i8x16.splat` and `i16x8.splat`
+instructions ignore the high bits of `x`.
+
+## Accessing lanes
+
+### Extract lane as a scalar
+* `b8x16.extractLane(a: b8x16, i: LaneIdx16) -> i32`
+* `b16x8.extractLane(a: b16x8, i: LaneIdx8) -> i32`
+* `b32x4.extractLane(a: b32x4, i: LaneIdx4) -> i32`
+* `b64x2.extractLane(a: b64x2, i: LaneIdx2) -> i32`
+* `i8x16.extractLane_s(a: v128, i: LaneIdx16) -> i32`
+* `i8x16.extractLane_u(a: v128, i: LaneIdx16) -> i32`
+* `i16x8.extractLane_s(a: v128, i: LaneIdx8) -> i32`
+* `i16x8.extractLane_u(a: v128, i: LaneIdx8) -> i32`
+* `i32x4.extractLane(a: v128, i: LaneIdx4) -> i32`
+* `i64x2.extractLane(a: v128, i: LaneIdx2) -> i64`
+* `f32x4.extractLane(a: v128, i: LaneIdx4) -> f32`
+* `f64x2.extractLane(a: v128, i: LaneIdx2) -> f64`
+
+Extract the value of lane `i` in `a`.
+
+```python
+def S.extractLane(a, i):
+    return a[i]
+```
+
+The `_s` and `_u` variants will sign-extend or zero-extend the lane value to
+`i32` respectively. Boolean lanes are returned as an `i32` with the value 0 or
+1.
+
+### Replace lane value
+* `b8x16.replaceLane(a: b8x16, i: LaneIdx16, x: i32) -> b8x16`
+* `b16x8.replaceLane(a: b16x8, i: LaneIdx8, x: i32) -> b16x8`
+* `b32x4.replaceLane(a: b32x4, i: LaneIdx4, x: i32) -> b32x4`
+* `b64x2.replaceLane(a: b64x2, i: LaneIdx2, x: i32) -> b64x2`
+* `i8x16.replaceLane(a: v128, i: LaneIdx16, x: i32) -> v128`
+* `i16x8.replaceLane(a: v128, i: LaneIdx8, x: i32) -> v128`
+* `i32x4.replaceLane(a: v128, i: LaneIdx4, x: i32) -> v128`
+* `i64x2.replaceLane(a: v128, i: LaneIdx2, x: i64) -> v128`
+* `f32x4.replaceLane(a: v128, i: LaneIdx4, x: f32) -> v128`
+* `f64x2.replaceLane(a: v128, i: LaneIdx2, x: f64) -> v128`
+
+Return a new vector with lanes identical to `a`, except for lane `i` which has
+the value `x`.
+
+```python
+def S.replaceLane(a, i, x):
+    result = S.New()
+    for j in range(S.Lanes):
+        result[j] = a[j]
+    result[i] = x
+    return result
+```
+
+The input lane value, `x`, is interpreted the same way as for the splat
+instructions. For the boolean vectors, non-zero means true; for the `i8` and
+`i16` lanes, the high bits of `x` are ignored.
+
+### Lane-wise select
+* `v8x16.select(s: b8x16, t: v128, f: v128) -> v128`
+* `v16x8.select(s: b16x8, t: v128, f: v128) -> v128`
+* `v32x4.select(s: b32x4, t: v128, f: v128) -> v128`
+* `v64x2.select(s: b64x2, t: v128, f: v128) -> v128`
+
+Use a boolean vector to select lanes from two numerical vectors.
+
+```python
+def S.select(s, t, f):
+    result = S.New()
+    for i in range(S.Lanes):
+        if s[i]:
+            result[i] = t[i]
+        else
+            result[i] = f[i]
+    return result
+```
+
+Note that the normal WebAssembly `select` instruction also works with vector
+types. It selects between two whole vectors controlled by a scalar value,
+rather than selecting lanes controlled by a boolean vector.
+
+### Swizzle lanes
+* `v8x16.swizzle(a: v128, s: LaneIdx16[16]) -> v128`
+* `v16x8.swizzle(a: v128, s: LaneIdx8[8]) -> v128`
+* `v32x4.swizzle(a: v128, s: LaneIdx4[4]) -> v128`
+* `v64x2.swizzle(a: v128, s: LaneIdx2[2]) -> v128`
+
+Create vector with lanes rearranged:
+
+```python
+def S.swizzle(a, s):
+    result = S.New()
+    for i in range(S.Lanes):
+        result[i] = a[s[i]]
+    return result
+```
+
+### Shuffle lanes
+* `v8x16.shuffle(a: v128, b: v128, s: LaneIdx32[16]) -> v128`
+* `v16x8.shuffle(a: v128, b: v128, s: LaneIdx16[8]) -> v128`
+* `v32x4.shuffle(a: v128, b: v128, s: LaneIdx8[4]) -> v128`
+* `v64x2.shuffle(a: v128, b: v128, s: LaneIdx4[2]) -> v128`
+
+Create vector with lanes selected from the lanes of two input vectors:
+
+```python
+def S.shuffle(a, b, s):
+    result = S.New()
+    for i in range(S.Lanes):
+        if s[i] < S.lanes:
+            result[i] = a[s[i]]
+        else:
+            result[i] = b[s[i] - S.lanes]
+    return result
+```
+
+## Integer arithmetic
+
+Wrapping integer arithmetic discards the high bits of the result.
+
+```python
+def S.Reduce(x):
+    bitmask = (1 << S.LaneBits) - 1
+    return x & bitmask
+```
+
+There is no integer division operation provided here. This operation is not
+commonly part of bit 128-bit SIMD ISAs.
+
+### Integer addition
+* `i8x16.add(a: v128, b: v128) -> v128`
+* `i16x8.add(a: v128, b: v128) -> v128`
+* `i32x4.add(a: v128, b: v128) -> v128`
+* `i64x2.add(a: v128, b: v128) -> v128`
+
+Lane-wise wrapping integer addition:
+
+```python
+def S.add(a, b):
+    def add(x, y):
+        return S.Reduce(x + y)
+    return S.lanewise_binary(add, a, b)
+```
+
+### Integer subtraction
+* `i8x16.sub(a: v128, b: v128) -> v128`
+* `i16x8.sub(a: v128, b: v128) -> v128`
+* `i32x4.sub(a: v128, b: v128) -> v128`
+* `i64x2.sub(a: v128, b: v128) -> v128`
+
+Lane-wise wrapping integer subtraction:
+
+```python
+def S.sub(a, b):
+    def sub(x, y):
+        return S.Reduce(x - y)
+    return S.lanewise_binary(sub, a, b)
+```
+
+### Integer multiplication
+* `i8x16.mul(a: v128, b: v128) -> v128`
+* `i16x8.mul(a: v128, b: v128) -> v128`
+* `i32x4.mul(a: v128, b: v128) -> v128`
+* `i64x2.mul(a: v128, b: v128) -> v128`
+
+Lane-wise wrapping integer multiplication:
+
+```python
+def S.mul(a, b):
+    def mul(x, y):
+        return S.Reduce(x * y)
+    return S.lanewise_binary(mul, a, b)
+```
+
+### Integer negation
+* `i8x16.neg(a: v128) -> v128`
+* `i16x8.neg(a: v128) -> v128`
+* `i32x4.neg(a: v128) -> v128`
+* `i64x2.neg(a: v128) -> v128`
+
+Lane-wise wrapping integer negation. In wrapping arithmetic, `y = -x` is the
+unique value such that `x + y == 0`.
+
+```python
+def S.neg(a):
+    def neg(x):
+        return S.Reduce(-x)
+    return S.lanewise_unary(neg, a)
+```
+
+## Saturating integer arithmetic
+
+Saturating integer arithmetic behaves differently on signed and unsigned lanes.
+It is only defined here for 8-bit and 16-bit integer lanes.
+
+```python
+def S.SignedSaturate(x):
+    if x < S.Smin:
+        return S.Smin
+    if x > S.Smax:
+        return S.Smax
+    return x
+
+def S.UnsignedSaturate(x):
+    if x > S.Umax:
+        return S.Umax
+    return x
+```
+
+### Saturating integer addition
+* `i8x16.add_saturate_s(a: v128, b: v128) -> v128`
+* `i8x16.add_saturate_u(a: v128, b: v128) -> v128`
+* `i16x8.add_saturate_s(a: v128, b: v128) -> v128`
+* `i16x8.add_saturate_u(a: v128, b: v128) -> v128`
+
+Lane-wise saturating addition:
+
+```python
+def S.add_saturate_s(a, b):
+    def addsat(x, y):
+        return S.SignedSaturate(x + y)
+    return S.lanewise_binary(addsat, S.AsSigned(a), S.AsSigned(b))
+
+def S.add_saturate_u(a, b):
+    def addsat(x, y):
+        return S.UnsignedSaturate(x + y)
+    return S.lanewise_binary(addsat, S.AsUnsigned(a), S.AsUnsigned(b))
+```
+
+### Saturating integer subtraction
+* `i8x16.sub_saturate_s(a: v128, b: v128) -> v128`
+* `i8x16.sub_saturate_u(a: v128, b: v128) -> v128`
+* `i16x8.sub_saturate_s(a: v128, b: v128) -> v128`
+* `i16x8.sub_saturate_u(a: v128, b: v128) -> v128`
+
+Lane-wise saturating subtraction:
+
+```python
+def S.sub_saturate_s(a, b):
+    def subsat(x, y):
+        return S.SignedSaturate(x - y)
+    return S.lanewise_binary(subsat, S.AsSigned(a), S.AsSigned(b))
+
+def S.sub_saturate_u(a, b):
+    def subsat(x, y):
+        return S.UnsignedSaturate(x - y)
+    return S.lanewise_binary(subsat, S.AsUnsigned(a), S.AsUnsigned(b))
+```
+
+## Bit shifts
+
+### Left shift by scalar
+* `i8x16.shl(a: v128, y: i32) -> v128`
+* `i16x8.shl(a: v128, y: i32) -> v128`
+* `i32x4.shl(a: v128, y: i32) -> v128`
+* `i64x2.shl(a: v128, y: i32) -> v128`
+
+Shift the bits in each lane to the left by the same amount. Only the low bits
+of the shift amount are used:
+
+```python
+def S.shl(a, x):
+    # Number of bits to shift: 0 .. S.LaneBits - 1.
+    amount = y mod S.LaneBits
+    def shift(x):
+        return S.Reduce(x << amount)
+    return S.lanewise_unary(shift, a)
+```
+
+### Right shift by scalar
+* `i8x16.shr_s(a: v128, y: i32) -> v128`
+* `i8x16.shr_u(a: v128, y: i32) -> v128`
+* `i16x8.shr_s(a: v128, y: i32) -> v128`
+* `i16x8.shr_u(a: v128, y: i32) -> v128`
+* `i32x4.shr_s(a: v128, y: i32) -> v128`
+* `i32x4.shr_u(a: v128, y: i32) -> v128`
+* `i64x2.shr_s(a: v128, y: i32) -> v128`
+* `i64x2.shr_u(a: v128, y: i32) -> v128`
+
+Shift the bits in each lane to the right by the same amount. This is an
+arithmetic right shift for the `_s` variants and a logical right shift for the
+`_u` variants.
+
+```python
+def S.shl_s(a, y):
+    # Number of bits to shift: 0 .. S.LaneBits - 1.
+    amount = y mod S.LaneBits
+    def shift(x):
+        return x >> amount
+    return S.lanewise_unary(shift, S.AsSigned(a))
+
+def S.shl_u(a, y):
+    # Number of bits to shift: 0 .. S.LaneBits - 1.
+    amount = y mod S.LaneBits
+    def shift(x):
+        return x >> amount
+    return S.lanewise_unary(shift, S.AsUnsigned(a))
+```
+
+## Logical operations
+
+The logical operations are defined on the boolean SIMD types. See also the
+[Bitwise operations](#bitwise-operations) below.
+
+### Logical and
+* `b8x16.and(a: b8x16, b: b8x16) -> b8x16`
+* `b16x8.and(a: b16x8, b: b16x8) -> b16x8`
+* `b32x4.and(a: b32x4, b: b32x4) -> b32x4`
+* `b64x2.and(a: b64x2, b: b64x2) -> b64x2`
+
+```python
+def S.and(a, b):
+    def logical_and(x, y):
+        return x and y
+    return S.lanewise_binary(logical_and, a, b)
+```
+
+### Logical or
+* `b8x16.or(a: b8x16, b: b8x16) -> b8x16`
+* `b16x8.or(a: b16x8, b: b16x8) -> b16x8`
+* `b32x4.or(a: b32x4, b: b32x4) -> b32x4`
+* `b64x2.or(a: b64x2, b: b64x2) -> b64x2`
+
+```python
+def S.or(a, b):
+    def logical_or(x, y):
+        return x or y
+    return S.lanewise_binary(logical_or, a, b)
+```
+
+### Logical xor
+* `b8x16.xor(a: b8x16, b: b8x16) -> b8x16`
+* `b16x8.xor(a: b16x8, b: b16x8) -> b16x8`
+* `b32x4.xor(a: b32x4, b: b32x4) -> b32x4`
+* `b64x2.xor(a: b64x2, b: b64x2) -> b64x2`
+
+```python
+def S.xor(a, b):
+    def logical_xor(x, y):
+        return x xor y
+    return S.lanewise_binary(logical_xor, a, b)
+```
+
+### Logical not
+* `b8x16.not(a: b8x16) -> b8x16`
+* `b16x8.not(a: b16x8) -> b16x8`
+* `b32x4.not(a: b32x4) -> b32x4`
+* `b64x2.not(a: b64x2) -> b64x2`
+
+```python
+def S.not(a):
+    def logical_not(x):
+        return not x
+    return S.lanewise_unary(logical_not, a)
+```
+
+## Bitwise operations
+
+The same logical operations defined on the boolean types are also available on
+the `v128` type where they operate bitwise the same way C's `&`, `|`, `^`, and
+`~` operators work on an `unsigned` type.
+
+* `v128.and(a: v128, b: v128) -> v128`
+* `v128.or(a: v128, b: v128) -> v128`
+* `v128.xor(a: v128, b: v128) -> v128`
+* `v128.not(a: v128) -> v128`
+
+## Boolean horizontal reductions
+
+These operations reduce all the lanes of a boolean vector to a single scalar
+boolean value.
+
+### Any lane true
+* `b8x16.any_true(a: b8x16) -> i32`
+* `b16x8.any_true(a: b16x8) -> i32`
+* `b32x4.any_true(a: b32x4) -> i32`
+* `b64x2.any_true(a: b64x2) -> i32`
+
+These functions return 1 if any lane in `a` is true, 0 otherwise.
+
+```python
+def S.any_true(a):
+    for i in range(S.Lanes):
+        if a[i]:
+            return 1
+    return 0
+```
+
+### All lanes true
+* `b8x16.all_true(a: b8x16) -> i32`
+* `b16x8.all_true(a: b16x8) -> i32`
+* `b32x4.all_true(a: b32x4) -> i32`
+* `b64x2.all_true(a: b64x2) -> i32`
+
+These functions return 1 if all lanes in `a` are true, 0 otherwise.
+
+```python
+def S.all_true(a):
+    for i in range(S.Lanes):
+        if not a[i]:
+            return 0
+    return 1
+```
+
+## Comparisons
+
+The comparison operations all compare two vectors lane-wise, and produce a
+boolean vector with the same number of lanes as the input interpretation.
+
+### Equality
+* `i8x16.eq(a: v128, b: v128) -> b8x16`
+* `i16x8.eq(a: v128, b: v128) -> b16x8`
+* `i32x4.eq(a: v128, b: v128) -> b32x4`
+* `i64x2.eq(a: v128, b: v128) -> b64x2`
+* `f32x4.eq(a: v128, b: v128) -> b32x4`
+* `f64x2.eq(a: v128, b: v128) -> b64x2`
+
+Integer equality is independent of the signed/unsigned interpretation. Floating
+point equality follows IEEE semantics, so a NaN lane compares not equal with
+anything, including itself, and +0.0 is equal to -0.0:
+
+```python
+def S.eq(a, b):
+    def eq(x, y):
+        return x == y
+    return S.lanewise_comparison(eq, a, b)
+```
+
+### Non-equality
+* `i8x16.ne(a: v128, b: v128) -> b8x16`
+* `i16x8.ne(a: v128, b: v128) -> b16x8`
+* `i32x4.ne(a: v128, b: v128) -> b32x4`
+* `i64x2.ne(a: v128, b: v128) -> b64x2`
+* `f32x4.ne(a: v128, b: v128) -> b32x4`
+* `f64x2.ne(a: v128, b: v128) -> b64x2`
+
+The `ne` operations produce the inverse of their `ne` counterparts:
+
+```python
+def S.ne(a, b):
+    def ne(x, y):
+        return x != y
+    return S.lanewise_comparison(ne, a, b)
+```
+
+### Less than
+* `i8x16.lt_s(a: v128, b: v128) -> b8x16`
+* `i8x16.lt_u(a: v128, b: v128) -> b8x16`
+* `i16x8.lt_s(a: v128, b: v128) -> b16x8`
+* `i16x8.lt_u(a: v128, b: v128) -> b16x8`
+* `i32x4.lt_s(a: v128, b: v128) -> b32x4`
+* `i32x4.lt_u(a: v128, b: v128) -> b32x4`
+* `i64x2.lt_s(a: v128, b: v128) -> b64x2`
+* `i64x2.lt_u(a: v128, b: v128) -> b64x2`
+* `f32x4.lt(a: v128, b: v128) -> b32x4`
+* `f64x2.lt(a: v128, b: v128) -> b64x2`
+
+### Less than or equal
+* `i8x16.le_s(a: v128, b: v128) -> b8x16`
+* `i8x16.le_u(a: v128, b: v128) -> b8x16`
+* `i16x8.le_s(a: v128, b: v128) -> b16x8`
+* `i16x8.le_u(a: v128, b: v128) -> b16x8`
+* `i32x4.le_s(a: v128, b: v128) -> b32x4`
+* `i32x4.le_u(a: v128, b: v128) -> b32x4`
+* `i64x2.le_s(a: v128, b: v128) -> b64x2`
+* `i64x2.le_u(a: v128, b: v128) -> b64x2`
+* `f32x4.le(a: v128, b: v128) -> b32x4`
+* `f64x2.le(a: v128, b: v128) -> b64x2`
+
+### Greater than
+* `i8x16.gt_s(a: v128, b: v128) -> b8x16`
+* `i8x16.gt_u(a: v128, b: v128) -> b8x16`
+* `i16x8.gt_s(a: v128, b: v128) -> b16x8`
+* `i16x8.gt_u(a: v128, b: v128) -> b16x8`
+* `i32x4.gt_s(a: v128, b: v128) -> b32x4`
+* `i32x4.gt_u(a: v128, b: v128) -> b32x4`
+* `i64x2.gt_s(a: v128, b: v128) -> b64x2`
+* `i64x2.gt_u(a: v128, b: v128) -> b64x2`
+* `f32x4.gt(a: v128, b: v128) -> b32x4`
+* `f64x2.gt(a: v128, b: v128) -> b64x2`
+
+### Greater than or equal
+* `i8x16.ge_s(a: v128, b: v128) -> b8x16`
+* `i8x16.ge_u(a: v128, b: v128) -> b8x16`
+* `i16x8.ge_s(a: v128, b: v128) -> b16x8`
+* `i16x8.ge_u(a: v128, b: v128) -> b16x8`
+* `i32x4.ge_s(a: v128, b: v128) -> b32x4`
+* `i32x4.ge_u(a: v128, b: v128) -> b32x4`
+* `i64x2.ge_s(a: v128, b: v128) -> b64x2`
+* `i64x2.ge_u(a: v128, b: v128) -> b64x2`
+* `f32x4.ge(a: v128, b: v128) -> b32x4`
+* `f64x2.ge(a: v128, b: v128) -> b64x2`
+
+## Load and store
+
+Load and store operations are provided for `v128` vectors, but not for the
+boolean vectors; we don't want to prescribe a bitwise representation of the
+boolean vectors.
+
+The memory operations take the same arguments and have the same semantics as
+the existing scalar WebAssembly load and store instructions. The difference is
+that the memory access size is 16 bytes which is also the natural alignment.
+
+### Load
+
+* `v128.load(memarg) -> v128`
+
+Load a `v128` vector from the given heap address.
+
+### Store
+
+* `v128.store(memarg, data: v128)`
+
+Store a `v128` vector to the given heap address.
+
+## Floating-point sign bit operations
+
+These floating point operations are simple manipulations of the sign bit. No
+changes are made to the exponent or trailing significand bits, even for NaN
+inputs.
+
+### Negation
+* `f32x4.neg(a: v128) -> v128`
+* `f64x2.neg(a: v128) -> v128`
+
+Apply the IEEE `negate(x)` function to each lane. This simply inverts the sign
+bit, preserving all other bits.
+
+```python
+def S.neg(a):
+    return S.lanewise_unary(ieee.negate, a)
+```
+
+### Absolute value
+* `f32x4.abs(a: v128) -> v128`
+* `f64x2.abs(a: v128) -> v128`
+
+Apply the IEEE `abs(x)` function to each lane. This simply clears the sign bit,
+preserving all other bits.
+
+```python
+def S.abs(a):
+    return S.lanewise_unary(ieee.abs, a)
+```
+
+## Floating-point min and max
+
+These operations are not part of the IEEE 754-2008 standard. They are lane-wise
+versions of the existing scalar WebAssembly operations.
+
+### NaN-propagating minimum
+* `f32x4.min(a: v128, b: v128) -> v128`
+* `f64x2.min(a: v128, b: v128) -> v128`
+
+Lane-wise minimum value, propagating NaNs.
+
+### NaN-propagating maximum
+* `f32x4.max(a: v128, b: v128) -> v128`
+* `f64x2.max(a: v128, b: v128) -> v128`
+
+Lane-wise maximum value, propagating NaNs.
+
+## Floating-point arithmetic
+
+The floating-point arithmetic operations are all lane-wise versions of the
+existing scalar WebAssembly operations.
+
+### Addition
+* `f32x4.add(a: v128, b: v128) -> v128`
+* `f64x2.add(a: v128, b: v128) -> v128`
+
+Lane-wise IEEE `addition`.
+
+### Subtraction
+* `f32x4.sub(a: v128, b: v128) -> v128`
+* `f64x2.sub(a: v128, b: v128) -> v128`
+
+Lane-wise IEEE `subtraction`.
+
+### Division
+* `f32x4.div(a: v128, b: v128) -> v128`
+* `f64x2.div(a: v128, b: v128) -> v128`
+
+Lane-wise IEEE `division`.
+
+### Multiplication
+* `f32x4.mul(a: v128, b: v128) -> v128`
+* `f64x2.mul(a: v128, b: v128) -> v128`
+
+Lane-wise IEEE `multiplication`.
+
+### Square root
+* `f32x4.sqrt(a: v128) -> v128`
+* `f64x2.sqrt(a: v128) -> v128`
+
+Lane-wise IEEE `squareRoot`.
+
+## Conversions
+### Integer to floating point
+* `f32x4.convert_s/i32x4(a: v128) -> v128`
+* `f32x4.convert_u/i32x4(a: v128) -> v128`
+* `f64x2.convert_s/i64x2(a: v128) -> v128`
+* `f64x2.convert_u/i64x2(a: v128) -> v128`
+
+Lane-wise conversion from integer to floating point. Some integer values will be
+rounded.
+
+### Floating point to integer
+* `i32x4.trunc_s/f32x4(a: v128) -> v128`
+* `i32x4.trunc_u/f32x4(a: v128) -> v128`
+* `i64x2.trunc_s/f64x2(a: v128) -> v128`
+* `i64x2.trunc_u/f64x2(a: v128) -> v128`
+
+Lane-wise conversion from floating point to integer using the IEEE
+`convertToIntegerTowardZero` function. If any lane is a NaN or the rounded
+integer value is outside the range of the destination type, these instructions
+trap.