Too many raw_bitcasts in SIMD code

### What is the feature or code improvement you would like to do in Cranelift?

During translation from Wasm to CLIF, a combination of Wasm's `v128` type and Cranelift's current type system forces us to add many `raw_bitcast` instructions between operations. For example, this Wasm code:
```
  (func (export "add-sub") (param v128 v128 v128) (result v128)
    (i16x8.add (i16x8.sub (local.get 0) (local.get 1))(local.get 2)))
```
Translates to this CLIF code:
```
function u0:4(i64 vmctx [%rdi], i8x16 [%xmm0], i8x16 [%xmm1], i64 fp [%rbp]) -> i8x16 [%xmm0], i64 fp [%rbp] system_v {
    ss0 = incoming_arg 16, offset -16

                                ebb0(v0: i64 [%rdi], v1: i8x16 [%xmm0], v2: i8x16 [%xmm1], v12: i64 [%rbp]):
[RexOp1pushq#50]                    x86_push v12
[RexOp1copysp#8089]                 copy_special %rsp -> %rbp
@00a6 [null_fpr#00,%xmm0]           v4 = raw_bitcast.i16x8 v1
@00a6 [Mp2vconst_optimized#5ef,%xmm2] v11 = vconst.i16x8 0x00
@00a6 [Mp2fa#5f9,%xmm2]             v5 = isub v11, v4
@00a6 [null_fpr#00,%xmm2]           v6 = raw_bitcast.i8x16 v5
@00aa [null_fpr#00,%xmm2]           v7 = raw_bitcast.i16x8 v6
@00aa [null_fpr#00,%xmm1]           v8 = raw_bitcast.i16x8 v2
@00aa [Mp2fa#5fd,%xmm2]             v9 = iadd v7, v8
@00aa [null_fpr#00,%xmm2]           v10 = raw_bitcast.i8x16 v9
@00ac [-]                           fallthrough ebb1(v10)

                                ebb1(v3: i8x16 [%xmm2]):
@00ac [Op2frmov#428]                regmove v3, %xmm2 -> %xmm0
[RexOp1popq#58,%rbp]                v13 = x86_pop.i64 
@00ac [Op1ret#c3]                   return v3, v13
}
```

This issue is to discuss if and how to remove these extra bitcasts.

### What is the value of adding this in Cranelift?

The extra `raw_bitcasts` emit no machine code but they are confusing when troubleshooting and add extra memory and processing overhead during compilation.

### Do you have an implementation plan, and/or ideas for data structures or algorithms to use?

Some options:

 1. add types to `load` and `const`: https://github.com/WebAssembly/simd/issues/125 was discussed in the Wasm SIMD Sync meeting (https://github.com/WebAssembly/simd/issues/121) and someone brought up that making `load` and `const` typed (e.g. `f32x4.load`) would allow compilers to attach the correct types to values and retain them through the less-strong `v128` operations (e.g. `xor`). https://github.com/WebAssembly/simd/issues/125 discusses this from a performance point of view but that addition would solve this issue.

 2. examine the DFG: another approach would be to look at the DFG to figure out the types of predecessors as mentioned in https://github.com/WebAssembly/simd/pull/1#issuecomment-295331508. This, however, would have to be extended for type signatures. Cranelift would have to look at the instructions in a function to figure out how the `v128` parameters are used. In the function `add-sub` above, with signature `(param v128 v128 v128)`, the addition and subtraction make this clear but some functions will make this analysis impossible.

 3. add a `V128` type to Cranelift: Cranelift's type system could be extended to include a `V128` type in Cranelift's type system that would include all `INxN`, `FNxN`, and `BNxN` types. The instruction types would stay the same (e.g. `iadd` should still only accept integers) but type-checking could be relaxed to allow the `V128` type to be used as one of its valid subtypes. This opens up a mechanism to get around the type-checking but arguably that already exists with `raw_bitcast`. Code that knows its types would remain as-is but Wasm-to-CLIF translated code could use the `V128` a bit more naturally than the `raw_bitcast`s.

 4. do nothing: I brought this up a long time ago when talking to @sunfishcode and that seemed the best thing to do then--I'm opening this issue to discuss whether that is still the case.

### Have you considered alternative implementations? If so, how are they better or worse than your proposal?

See above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Too many raw_bitcasts in SIMD code #1147

What is the feature or code improvement you would like to do in Cranelift?

What is the value of adding this in Cranelift?

Do you have an implementation plan, and/or ideas for data structures or algorithms to use?

Have you considered alternative implementations? If so, how are they better or worse than your proposal?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Too many raw_bitcasts in SIMD code #1147

Description

What is the feature or code improvement you would like to do in Cranelift?

What is the value of adding this in Cranelift?

Do you have an implementation plan, and/or ideas for data structures or algorithms to use?

Have you considered alternative implementations? If so, how are they better or worse than your proposal?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions