WebAssembly · lemaitre · Apr 19, 2018 · Apr 19, 2018 · Apr 19, 2018 · Aug 8, 2018
diff --git a/proposals/simd/BinarySIMD.md b/proposals/simd/BinarySIMD.md
@@ -167,4 +167,7 @@ The `v8x16.shuffle2_imm` instruction has 16 bytes after `simdop`.
 | `f64x2.convert_s/i64x2`   |    `0xb1`| -                  |
 | `f64x2.convert_u/i64x2`   |    `0xb2`| -                  |
 | `v8x16.shuffle1`          |    `0xc0`| -                  |
-| `v8x16.shuffle2_imm`      |    `0xc1`| s:LaneIdx32[16]    |
+| `v8x16.shuffle2_imm`      |    `0xcc`| s:LaneIdx32[16]    |
+| `v16x8.shuffle2_imm`      |    `0xcd`| s:LaneIdx16[8]     |
+| `v32x4.shuffle2_imm`      |    `0xce`| s:LaneIdx8[4]      |
+| `v64x2.shuffle2_imm`      |    `0xcf`| s:LaneIdx4[2]      |
diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
@@ -293,6 +293,30 @@ instruction is encoded with 16 bytes providing the indices of the elements to
 return. The indices `i` in range `[0, 15]` select the `i`-th element of `a`. The
 indices in range `[16, 31]` select the `i - 16`-th element of `b`.
 
+* `v16x8.shuffle(a: v128, b: v128, imm: ImmLaneIdx16[8]) -> v128`
+
+Returns a new vector with lanes selected from the lanes of the two input vectors
+`a` and `b` specified in the 8 byte wide immediate mode operand `imm`. This
+instruction is encoded with 8 bytes providing the indices of the elements to
+return. The indices `i` in range `[0, 7]` select the `i`-th element of `a`. The
+indices in range `[8, 15]` select the `i - 8`-th element of `b`.
+
+* `v32x4.shuffle(a: v128, b: v128, imm: ImmLaneIdx8[4]) -> v128`
+
+Returns a new vector with lanes selected from the lanes of the two input vectors
+`a` and `b` specified in the 4 byte wide immediate mode operand `imm`. This
+instruction is encoded with 4 bytes providing the indices of the elements to
+return. The indices `i` in range `[0, 3]` select the `i`-th element of `a`. The
+indices in range `[4, 7]` select the `i - 4`-th element of `b`.
+
+* `v64x2.shuffle(a: v128, b: v128, imm: ImmLaneIdx4[2]) -> v128`
+
+Returns a new vector with lanes selected from the lanes of the two input vectors
+`a` and `b` specified in the 2 byte wide immediate mode operand `imm`. This
+instruction is encoded with 2 bytes providing the indices of the elements to
+return. The indices `i` in range `[0, 1]` select the `i`-th element of `a`. The
+indices in range `[2, 3]` select the `i - 2`-th element of `b`.
+
 ```python
 def S.shuffle2_imm(a, b, s):
     result = S.New()
@@ -775,3 +799,31 @@ Lane-wise saturating conversion from floating point to integer using the IEEE
 resulting lane is 0. If the rounded integer value of a lane is outside the
 range of the destination type, the result is saturated to the nearest
 representable integer value.
+
+
+## Reductions
+
+There is no instruction for reductions.
+Instead, one can use permutations to reduce lane-wise operations like `add`, `min`, `max`, `and`, `or`...
+
+Here is an example to reduce add on f32x4:
+```
+get_local 0
+get_local 0
+v64x2.shuffle 1 0  ;; swap the lower part with the higher part of the vector
+f32x4.add
+get_local 0
+get_local 0
+v32x4.shuffle 1 0 3 2  ;; swap the 2 first elements together, and the 2 last elements together
+f32x4.add
+f32x4.extract_lane 0  ;; extract the first element
+```
+
+Here is an example to reduce add on f64x2:
+```
+get_local 0
+get_local 0
+v64x2.shuffle 1 0  ;; swap the lower part with the higher part of the vector
+f64x2.add
+f64x2.extract_lane 0  ;; extract the first element
+```
diff --git a/proposals/simd/TextSIMD.md b/proposals/simd/TextSIMD.md
@@ -20,8 +20,11 @@ The canonical text format used for printing `v128.const` instructions is
 v128.const i32x4 0xNNNNNNNN 0xNNNNNNNN 0xNNNNNNNN 0xNNNNNNNN
 ```
 
-### v8x16.shuffle2_imm
+### Shuffling using immediate indices
 
 ```
 v8x16.shuffle2_imm i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5
+v16x8.shuffle2_imm i4 i4 i4 i4 i4 i4 i4 i4
+v32x4.shuffle2_imm i3 i3 i3 i3
+v64x2.shuffle2_imm i2 i2
 ```