Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Reorganize opcodes by kind and type #48

Merged
merged 2 commits into from
Nov 5, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
307 changes: 167 additions & 140 deletions proposals/simd/BinarySIMD.md
Original file line number Diff line number Diff line change
@@ -27,143 +27,170 @@ The `v8x16.shuffle` instruction has 16 bytes after `simdop`.

| Instruction | `simdop` | Immediate operands |
| --------------------------|---------:|--------------------|
| `v128.const` | 0 | i:ImmByte[16] |
| `v128.load` | 1 | m:memarg |
| `v128.store` | 2 | m:memarg |
| `i8x16.splat` | 3 | - |
| `i16x8.splat` | 4 | - |
| `i32x4.splat` | 5 | - |
| `i64x2.splat` | 6 | - |
| `f32x4.splat` | 7 | - |
| `f64x2.splat` | 8 | - |
| `i8x16.extract_lane_s` | 9 | i:LaneIdx16 |
| `i8x16.extract_lane_u` | 10 | i:LaneIdx16 |
| `i16x8.extract_lane_s` | 11 | i:LaneIdx8 |
| `i16x8.extract_lane_u` | 12 | i:LaneIdx8 |
| `i32x4.extract_lane` | 13 | i:LaneIdx4 |
| `i64x2.extract_lane` | 14 | i:LaneIdx2 |
| `f32x4.extract_lane` | 15 | i:LaneIdx4 |
| `f64x2.extract_lane` | 16 | i:LaneIdx2 |
| `i8x16.replace_lane` | 17 | i:LaneIdx16 |
| `i16x8.replace_lane` | 18 | i:LaneIdx8 |
| `i32x4.replace_lane` | 19 | i:LaneIdx4 |
| `i64x2.replace_lane` | 20 | i:LaneIdx2 |
| `f32x4.replace_lane` | 21 | i:LaneIdx4 |
| `f64x2.replace_lane` | 22 | i:LaneIdx2 |
| `v8x16.shuffle` | 23 | s:LaneIdx32[16] |
| `i8x16.add` | 24 | - |
| `i16x8.add` | 25 | - |
| `i32x4.add` | 26 | - |
| `i64x2.add` | 27 | - |
| `i8x16.sub` | 28 | - |
| `i16x8.sub` | 29 | - |
| `i32x4.sub` | 30 | - |
| `i64x2.sub` | 31 | - |
| `i8x16.mul` | 32 | - |
| `i16x8.mul` | 33 | - |
| `i32x4.mul` | 34 | - |
| `i8x16.neg` | 36 | - |
| `i16x8.neg` | 37 | - |
| `i32x4.neg` | 38 | - |
| `i64x2.neg` | 39 | - |
| `i8x16.add_saturate_s` | 40 | - |
| `i8x16.add_saturate_u` | 41 | - |
| `i16x8.add_saturate_s` | 42 | - |
| `i16x8.add_saturate_u` | 43 | - |
| `i8x16.sub_saturate_s` | 44 | - |
| `i8x16.sub_saturate_u` | 45 | - |
| `i16x8.sub_saturate_s` | 46 | - |
| `i16x8.sub_saturate_u` | 47 | - |
| `i8x16.shl` | 48 | - |
| `i16x8.shl` | 49 | - |
| `i32x4.shl` | 50 | - |
| `i64x2.shl` | 51 | - |
| `i8x16.shr_s` | 52 | - |
| `i8x16.shr_u` | 53 | - |
| `i16x8.shr_s` | 54 | - |
| `i16x8.shr_u` | 55 | - |
| `i32x4.shr_s` | 56 | - |
| `i32x4.shr_u` | 57 | - |
| `i64x2.shr_s` | 58 | - |
| `i64x2.shr_u` | 59 | - |
| `v128.and` | 60 | - |
| `v128.or` | 61 | - |
| `v128.xor` | 62 | - |
| `v128.not` | 63 | - |
| `v128.bitselect` | 64 | - |
| `i8x16.any_true` | 65 | - |
| `i16x8.any_true` | 66 | - |
| `i32x4.any_true` | 67 | - |
| `i64x2.any_true` | 68 | - |
| `i8x16.all_true` | 69 | - |
| `i16x8.all_true` | 70 | - |
| `i32x4.all_true` | 71 | - |
| `i64x2.all_true` | 72 | - |
| `i8x16.eq` | 73 | - |
| `i16x8.eq` | 74 | - |
| `i32x4.eq` | 75 | - |
| `f32x4.eq` | 77 | - |
| `f64x2.eq` | 78 | - |
| `i8x16.ne` | 79 | - |
| `i16x8.ne` | 80 | - |
| `i32x4.ne` | 81 | - |
| `f32x4.ne` | 83 | - |
| `f64x2.ne` | 84 | - |
| `i8x16.lt_s` | 85 | - |
| `i8x16.lt_u` | 86 | - |
| `i16x8.lt_s` | 87 | - |
| `i16x8.lt_u` | 88 | - |
| `i32x4.lt_s` | 89 | - |
| `i32x4.lt_u` | 90 | - |
| `f32x4.lt` | 93 | - |
| `f64x2.lt` | 94 | - |
| `i8x16.le_s` | 95 | - |
| `i8x16.le_u` | 96 | - |
| `i16x8.le_s` | 97 | - |
| `i16x8.le_u` | 98 | - |
| `i32x4.le_s` | 99 | - |
| `i32x4.le_u` | 100 | - |
| `f32x4.le` | 103 | - |
| `f64x2.le` | 104 | - |
| `i8x16.gt_s` | 105 | - |
| `i8x16.gt_u` | 106 | - |
| `i16x8.gt_s` | 107 | - |
| `i16x8.gt_u` | 108 | - |
| `i32x4.gt_s` | 109 | - |
| `i32x4.gt_u` | 110 | - |
| `f32x4.gt` | 113 | - |
| `f64x2.gt` | 114 | - |
| `i8x16.ge_s` | 115 | - |
| `i8x16.ge_u` | 116 | - |
| `i16x8.ge_s` | 117 | - |
| `i16x8.ge_u` | 118 | - |
| `i32x4.ge_s` | 119 | - |
| `i32x4.ge_u` | 120 | - |
| `f32x4.ge` | 123 | - |
| `f64x2.ge` | 124 | - |
| `f32x4.neg` | 125 | - |
| `f64x2.neg` | 126 | - |
| `f32x4.abs` | 127 | - |
| `f64x2.abs` | 128 | - |
| `f32x4.min` | 129 | - |
| `f64x2.min` | 130 | - |
| `f32x4.max` | 131 | - |
| `f64x2.max` | 132 | - |
| `f32x4.add` | 133 | - |
| `f64x2.add` | 134 | - |
| `f32x4.sub` | 135 | - |
| `f64x2.sub` | 136 | - |
| `f32x4.div` | 137 | - |
| `f64x2.div` | 138 | - |
| `f32x4.mul` | 139 | - |
| `f64x2.mul` | 140 | - |
| `f32x4.sqrt` | 141 | - |
| `f64x2.sqrt` | 142 | - |
| `f32x4.convert_s/i32x4` | 143 | - |
| `f32x4.convert_u/i32x4` | 144 | - |
| `f64x2.convert_s/i64x2` | 145 | - |
| `f64x2.convert_u/i64x2` | 146 | - |
| `i32x4.trunc_s/f32x4:sat` | 147 | - |
| `i32x4.trunc_u/f32x4:sat` | 148 | - |
| `i64x2.trunc_s/f64x2:sat` | 149 | - |
| `i64x2.trunc_u/f64x2:sat` | 150 | - |
| `v128.load` | `0x00`| m:memarg |
| `v128.store` | `0x01`| m:memarg |
| `v128.const` | `0x02`| i:ImmByte[16] |
| `v8x16.shuffle` | `0x03`| s:LaneIdx32[16] |
| `i8x16.splat` | `0x04`| - |
| `i8x16.extract_lane_s` | `0x05`| i:LaneIdx16 |
| `i8x16.extract_lane_u` | `0x06`| i:LaneIdx16 |
| `i8x16.replace_lane` | `0x07`| i:LaneIdx16 |
| `i16x8.splat` | `0x08`| - |
| `i16x8.extract_lane_s` | `0x09`| i:LaneIdx8 |
| `i16x8.extract_lane_u` | `0x0a`| i:LaneIdx8 |
| `i16x8.replace_lane` | `0x0b`| i:LaneIdx8 |
| `i32x4.splat` | `0x0c`| - |
| `i32x4.extract_lane` | `0x0d`| i:LaneIdx4 |
| `i32x4.replace_lane` | `0x0e`| i:LaneIdx4 |
| `i64x2.splat` | `0x0f`| - |
| `i64x2.extract_lane` | `0x10`| i:LaneIdx2 |
| `i64x2.replace_lane` | `0x11`| i:LaneIdx2 |
| `f32x4.splat` | `0x12`| - |
| `f32x4.extract_lane` | `0x13`| i:LaneIdx4 |
| `f32x4.replace_lane` | `0x14`| i:LaneIdx4 |
| `f64x2.splat` | `0x15`| - |
| `f64x2.extract_lane` | `0x16`| i:LaneIdx2 |
| `f64x2.replace_lane` | `0x17`| i:LaneIdx2 |
| `i8x16.eq` | `0x18`| - |
| `i8x16.ne` | `0x19`| - |
| `i8x16.lt_s` | `0x1a`| - |
| `i8x16.lt_u` | `0x1b`| - |
| `i8x16.gt_s` | `0x1c`| - |
| `i8x16.gt_u` | `0x1d`| - |
| `i8x16.le_s` | `0x1e`| - |
| `i8x16.le_u` | `0x1f`| - |
| `i8x16.ge_s` | `0x20`| - |
| `i8x16.ge_u` | `0x21`| - |
| `i16x8.eq` | `0x22`| - |
| `i16x8.ne` | `0x23`| - |
| `i16x8.lt_s` | `0x24`| - |
| `i16x8.lt_u` | `0x25`| - |
| `i16x8.gt_s` | `0x26`| - |
| `i16x8.gt_u` | `0x27`| - |
| `i16x8.le_s` | `0x28`| - |
| `i16x8.le_u` | `0x29`| - |
| `i16x8.ge_s` | `0x2a`| - |
| `i16x8.ge_u` | `0x2b`| - |
| `i32x4.eq` | `0x2c`| - |
| `i32x4.ne` | `0x2d`| - |
| `i32x4.lt_s` | `0x2e`| - |
| `i32x4.lt_u` | `0x2f`| - |
| `i32x4.gt_s` | `0x30`| - |
| `i32x4.gt_u` | `0x31`| - |
| `i32x4.le_s` | `0x32`| - |
| `i32x4.le_u` | `0x33`| - |
| `i32x4.ge_s` | `0x34`| - |
| `i32x4.ge_u` | `0x35`| - |
| - | `0x36`| - |
| - | `0x37`| - |
| - | `0x38`| - |
| - | `0x39`| - |
| - | `0x3a`| - |
| - | `0x3b`| - |
| - | `0x3c`| - |
| - | `0x3d`| - |
| - | `0x3e`| - |
| - | `0x3f`| - |
| `f32x4.eq` | `0x40`| - |
| `f32x4.ne` | `0x41`| - |
| `f32x4.lt` | `0x42`| - |
| `f32x4.gt` | `0x43`| - |
| `f32x4.le` | `0x44`| - |
| `f32x4.ge` | `0x45`| - |
| `f64x2.eq` | `0x46`| - |
| `f64x2.ne` | `0x47`| - |
| `f64x2.lt` | `0x48`| - |
| `f64x2.gt` | `0x49`| - |
| `f64x2.le` | `0x4a`| - |
| `f64x2.ge` | `0x4b`| - |
| `v128.and` | `0x4c`| - |
| `v128.or` | `0x4d`| - |
| `v128.xor` | `0x4e`| - |
| `v128.not` | `0x4f`| - |
| `v128.bitselect` | `0x50`| - |
| `i8x16.add` | `0x51`| - |
| `i8x16.add_saturate_s` | `0x52`| - |
| `i8x16.add_saturate_u` | `0x53`| - |
| `i8x16.sub` | `0x54`| - |
| `i8x16.sub_saturate_s` | `0x55`| - |
| `i8x16.sub_saturate_u` | `0x56`| - |
| `i8x16.mul` | `0x57`| - |
| - | `0x58`| - |
| - | `0x59`| - |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved for min/max.

| `i8x16.shl` | `0x5a`| - |
| `i8x16.shr_s` | `0x5b`| - |
| `i8x16.shr_u` | `0x5c`| - |
| `i8x16.neg` | `0x5d`| - |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this while updating wabt: the MVP instructions group unary and binary ops together by type, but have unary before binary, e.g.:

opcode name
0x67 i32.clz
0x68 i32.ctz
0x69 i32.popcnt
0x6a i32.add
...

| `i8x16.any_true` | `0x5e`| - |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me to consistently adopt use v8x16 for sign agnostic operations, it's a little weird to me that we do this only for v8x16.shuffle. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also think that using v8x16 for only shuffles is weird (see #38), but I think keeping i8x16 and friends for sign-agnostic operations that still only make sense for integer lanes is reasonable. To counter, what do you think of changing shuffle to be v128.shuffle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using v8x16.shuffle makes sense because it's a generic byte shuffle, so the name is more indicative than a v128.shuffle, but I don't feel strongly about which way we go here.

| `i8x16.all_true` | `0x5f`| - |
| `i16x8.add` | `0x60`| - |
| `i16x8.add_saturate_s` | `0x61`| - |
| `i16x8.add_saturate_u` | `0x62`| - |
| `i16x8.sub` | `0x63`| - |
| `i16x8.sub_saturate_s` | `0x64`| - |
| `i16x8.sub_saturate_u` | `0x65`| - |
| `i16x8.mul` | `0x66`| - |
| - | `0x67`| - |
| - | `0x68`| - |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved for min/max.

| `i16x8.shl` | `0x69`| - |
| `i16x8.shr_s` | `0x6a`| - |
| `i16x8.shr_u` | `0x6b`| - |
| `i16x8.neg` | `0x6c`| - |
| `i16x8.any_true` | `0x6d`| - |
| `i16x8.all_true` | `0x6e`| - |
| `i32x4.add` | `0x6f`| - |
| - | `0x70`| - |
| - | `0x71`| - |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved for saturating adds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no hardware instructions that map to saturating arithmetic for i32x4, and i64x2 types that I know of, so I'm unsure if this will ever be useful - but the opcode space is large enough that this looks ok to me. Same for the other saturating operations below.

| `i32x4.sub` | `0x72`| - |
| - | `0x73`| - |
| - | `0x74`| - |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved for saturating subtractions.

| `i32x4.mul` | `0x75`| - |
| - | `0x76`| - |
| - | `0x77`| - |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved for min/max.

| `i32x4.shl` | `0x78`| - |
| `i32x4.shr_s` | `0x79`| - |
| `i32x4.shr_u` | `0x7a`| - |
| `i32x4.neg` | `0x7b`| - |
| `i32x4.any_true` | `0x7c`| - |
| `i32x4.all_true` | `0x7d`| - |
| `i64x2.add` | `0x7e`| - |
| - | `0x7f`| - |
| - | `0x80`| - |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved for saturating adds.

| `i64x2.sub` | `0x81`| - |
| - | `0x82`| - |
| - | `0x83`| - |
| - | `0x84`| - |
| - | `0x85`| - |
| - | `0x86`| - |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved for saturating subtractions, multiply, and min/max.

| `i64x2.shl` | `0x87`| - |
| `i64x2.shr_s` | `0x88`| - |
| `i64x2.shr_u` | `0x89`| - |
| `i64x2.neg` | `0x8a`| - |
| `i64x2.any_true` | `0x8b`| - |
| `i64x2.all_true` | `0x8c`| - |
| `f32x4.add` | `0x8d`| - |
| `f32x4.sub` | `0x8e`| - |
| `f32x4.mul` | `0x8f`| - |
| `f32x4.div` | `0x90`| - |
| `f32x4.min` | `0x91`| - |
| `f32x4.max` | `0x92`| - |
| `f32x4.neg` | `0x93`| - |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MVP instructions have the order abs, neg, sqrt

| `f32x4.abs` | `0x94`| - |
| `f32x4.sqrt` | `0x95`| - |
| `f64x2.add` | `0x96`| - |
| `f64x2.sub` | `0x97`| - |
| `f64x2.mul` | `0x98`| - |
| `f64x2.div` | `0x99`| - |
| `f64x2.min` | `0x9a`| - |
| `f64x2.max` | `0x9b`| - |
| `f64x2.neg` | `0x9c`| - |
| `f64x2.abs` | `0x9d`| - |
| `f64x2.sqrt` | `0x9e`| - |
| `i32x4.trunc_s/f32x4:sat` | `0x9f`| - |
| `i32x4.trunc_u/f32x4:sat` | `0xa0`| - |
| `i64x2.trunc_s/f64x2:sat` | `0xa1`| - |
| `i64x2.trunc_u/f64x2:sat` | `0xa2`| - |
| `f32x4.convert_s/i32x4` | `0xa3`| - |
| `f32x4.convert_u/i32x4` | `0xa4`| - |
| `f64x2.convert_s/i64x2` | `0xa5`| - |
| `f64x2.convert_u/i64x2` | `0xa6`| - |