-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
What is the feature or code improvement you would like to do in Cranelift?
During translation from Wasm to CLIF, a combination of Wasm's v128
type and Cranelift's current type system forces us to add many raw_bitcast
instructions between operations. For example, this Wasm code:
(func (export "add-sub") (param v128 v128 v128) (result v128)
(i16x8.add (i16x8.sub (local.get 0) (local.get 1))(local.get 2)))
Translates to this CLIF code:
function u0:4(i64 vmctx [%rdi], i8x16 [%xmm0], i8x16 [%xmm1], i64 fp [%rbp]) -> i8x16 [%xmm0], i64 fp [%rbp] system_v {
ss0 = incoming_arg 16, offset -16
ebb0(v0: i64 [%rdi], v1: i8x16 [%xmm0], v2: i8x16 [%xmm1], v12: i64 [%rbp]):
[RexOp1pushq#50] x86_push v12
[RexOp1copysp#8089] copy_special %rsp -> %rbp
@00a6 [null_fpr#00,%xmm0] v4 = raw_bitcast.i16x8 v1
@00a6 [Mp2vconst_optimized#5ef,%xmm2] v11 = vconst.i16x8 0x00
@00a6 [Mp2fa#5f9,%xmm2] v5 = isub v11, v4
@00a6 [null_fpr#00,%xmm2] v6 = raw_bitcast.i8x16 v5
@00aa [null_fpr#00,%xmm2] v7 = raw_bitcast.i16x8 v6
@00aa [null_fpr#00,%xmm1] v8 = raw_bitcast.i16x8 v2
@00aa [Mp2fa#5fd,%xmm2] v9 = iadd v7, v8
@00aa [null_fpr#00,%xmm2] v10 = raw_bitcast.i8x16 v9
@00ac [-] fallthrough ebb1(v10)
ebb1(v3: i8x16 [%xmm2]):
@00ac [Op2frmov#428] regmove v3, %xmm2 -> %xmm0
[RexOp1popq#58,%rbp] v13 = x86_pop.i64
@00ac [Op1ret#c3] return v3, v13
}
This issue is to discuss if and how to remove these extra bitcasts.
What is the value of adding this in Cranelift?
The extra raw_bitcasts
emit no machine code but they are confusing when troubleshooting and add extra memory and processing overhead during compilation.
Do you have an implementation plan, and/or ideas for data structures or algorithms to use?
Some options:
-
add types to
load
andconst
: Concerns about integer vs floating-point instructions on x86 WebAssembly/simd#125 was discussed in the Wasm SIMD Sync meeting (SIMD Sync meeting 10/22/2019 Agenda WebAssembly/simd#121) and someone brought up that makingload
andconst
typed (e.g.f32x4.load
) would allow compilers to attach the correct types to values and retain them through the less-strongv128
operations (e.g.xor
). Concerns about integer vs floating-point instructions on x86 WebAssembly/simd#125 discusses this from a performance point of view but that addition would solve this issue. -
examine the DFG: another approach would be to look at the DFG to figure out the types of predecessors as mentioned in Initial 128-bit SIMD proposal WebAssembly/simd#1 (comment). This, however, would have to be extended for type signatures. Cranelift would have to look at the instructions in a function to figure out how the
v128
parameters are used. In the functionadd-sub
above, with signature(param v128 v128 v128)
, the addition and subtraction make this clear but some functions will make this analysis impossible. -
add a
V128
type to Cranelift: Cranelift's type system could be extended to include aV128
type in Cranelift's type system that would include allINxN
,FNxN
, andBNxN
types. The instruction types would stay the same (e.g.iadd
should still only accept integers) but type-checking could be relaxed to allow theV128
type to be used as one of its valid subtypes. This opens up a mechanism to get around the type-checking but arguably that already exists withraw_bitcast
. Code that knows its types would remain as-is but Wasm-to-CLIF translated code could use theV128
a bit more naturally than theraw_bitcast
s. -
do nothing: I brought this up a long time ago when talking to @sunfishcode and that seemed the best thing to do then--I'm opening this issue to discuss whether that is still the case.
Have you considered alternative implementations? If so, how are they better or worse than your proposal?
See above.