Add extend-add-pairwise instructions x64 #3031

jlb6740 · 2021-06-24T23:11:22Z

No description provided.

akirilov-arm · 2021-06-30T16:25:42Z

cranelift/codegen/meta/src/shared/instructions.rs

+
+    ig.push(
+        Inst::new(
+            "extended_pairwise_add_signed",


An alternative approach is to introduce a binary pairwise addition IR operation, e.g. iadd_pairwise, which sounds like a more useful primitive operation - it could be used both by other existing Wasm instructions like i32x4.dot_i16x8_s and possible future additions (some of which have already been discussed - see here for an example); in that case extended_pairwise_add_signed would become iadd_pairwise(swiden_low, swiden_high). We need to do some pattern-matching for optimal AArch64 code generation (which is a single instruction), but I am not sure about other architectures - any thoughts from the people working on the backends?

cc @abrown @cfallin @uweigand

P.S. Actually even considering the extended pairwise additions in isolation - the iadd_pairwise way introduces only one IR operation, while the approach in this PR adds two.

A general comment: in the past we were implicitly following the "make a new CLIF instruction for each Wasm SIMD instruction" paradigm so a lot of existing CLIF instructions could be split up in the same way you describe above. I think this is a thing that should be done (it just seems to make sense to simplify CLIF even at the risk of a few extra cycles during lowering) but I don't feel very strongly about it so I would be fine with either:

we do the pattern matching now AND we start a review of the existing SIMD instructions in a separate issue to clean up CLIF

we just start the review of existing SIMD instructions now and eventually remove the unnecessary CLIF instructions and switch to more pattern matching; this would mean following the "make a new CLIF instruction for each Wasm SIMD instruction" for the last few outstanding instructions

In summary, I don't mind if we do iadd_pairwise now or later, but I do think we need to commit to doing the same to the rest of SIMD CLIF.

I forgot to link to the discussion in #2982 for context.

@abrown While I don't have any strong feelings either, in #3044 I have already started with the first of your suggested approaches. In fact, the only Wasm SIMD operations that are not already covered by merged code or pull requests are the family of i*.extmul_*_i* operations, for which the specification text itself provides a simple way to express them in terms of other basic operations.

I agree that opening an issue for CLIF clean-up is definitely one of the steps forward, no matter what we decide to do.

Opened #3045 for this.

Just commented over in #3045 with thoughts on the general question; for this specific case I think it does seem reasonable to split into the pairwise add with extends on the inputs. But let's see what we come to over there!

@akirilov-arm @cfallin Hi .. so the suggestion here is to remove the new IR instructions extended_pairwise_add_signed and extended_pairwise_add_unsigned and instead create one called iadd_pairwise. When lowering .. we would keep the lowering code that is there now, but pattern match for the instruction and type during lowering in order to get the input, or is the suggestion to scrape that lowering code too, actually lower swiden_low and swiden_high where we then are using the output of those particular instructions to be input to iadd_pairwise which needs to then be a new implementation from what is there now? I look at the change made in #2982 and it really is just avoiding creating another IR instruction while maybe adding a little logic of doing the matching, but keeps the original sequence. Here I am not sure if the suggestion is to do that or go further?

IMHO the handling should be equivalent to the one for the fcvt_from_uint operation in #2982 - that is, there should be a code path that handles iadd_pairwise irrespective of its inputs, to serve as a fallback. Then right before that path there should be a check if the inputs match the pattern; if they do, execution should proceed through your current lowering code for extended_pairwise_add_*.

BTW @sparker-arm already has the common iadd_pairwise implementation and the AArch64-specific bits, but he is waiting for this pull request to get merged, so that you do not step on each other's toes. In the meantime, he may be able to offer some advice.

Yeah, I don't mind posting this up if people are interested.

Hi @sparker-arm .. can you post here?@akirilov-arm I still am not sure I understand what changes are in mind. Currently in code_translators there is this new instruction extended_pairwise_add_signed called here:

Operator::I16x8ExtAddPairwiseI8x16S => { let a = pop1_with_bitcast(state, I8X16, builder); state.push1(builder.ins().extended_pairwise_add_signed(a)) }

and new instruction extended_pairwise_add_unsigned called here:

Operator::I16x8ExtAddPairwiseI8x16U => { let a = pop1_with_bitcast(state, I8X16, builder); state.push1(builder.ins().extended_pairwise_add_unsigned(a)) }

You are saying to instead just call one instruction for both (iadd_pairwise) and then lower from there? I think this is what you are saying but if you are, how do we know if the instruction started as signed or unsigned? Also should that be extended_iadd_pairwise instead of iadd_pairwise?

Sure, I'll post the whole thing up later today. But just to point out that we'll know the signedness because of the [u|s]widen inputs, for example:

Operator::I16x8ExtAddPairwiseI8x16S => { let a = pop1_with_bitcast(state, I8X16, builder); let widen_low = builder.ins().swiden_low(a); let widen_high = builder.ins().swiden_high(a); state.push1(builder.ins().iadd_pairwise(widen_low, widen_high)); }

This creates a bit more work in the AArch64 backend, but it does also mean that we can fallback to an ADDP when we don't successfully match the extending inputs. As a side queston, unless I'm mistaken wasm doesn't have a horizontal add - does anyone know why?

jlb6740 · 2021-07-30T04:16:09Z

This is a refactoring based on earlier comments. The lowering remains intact it is just that now the clif instruction operands are more instructions. Matching in lowering is now different to properly match on the WASM instruction we are intending to lower.

The CI failures appear to be from an unrelated issue. Wish I knew how to resolve but I can see a stoppage on a dead code warning?

#3123 This should be the last PR for full x64 Wasm SIMD support. Of course there are some refactoring and clean-up TODOs that should be done as follow-up.

sparker-arm

I personally wouldn't rely only on the conformance test suite, we've already had problems with it. Pretty sure you've got some bugs here and I guess the existing test suite isn't exercising them.

sparker-arm · 2021-07-30T07:59:22Z

cranelift/codegen/meta/src/shared/instructions.rs

+        Inst::new(
+            "iadd_pairwise",
+            r#"
+        Does Lane-wise integer pairwise addition on two operands, putting the


I think it would be good to have a clearer definition of the semantics, this currently doesn't tell me which elements make a pair.

sparker-arm · 2021-07-30T08:08:04Z

cranelift/codegen/src/isa/x64/lower.rs

+                                RegMem::reg(mul_const_reg.to_reg()),
+                                dst,
+                            ));
+                            ctx.emit(Inst::xmm_rm_r(SseOpcode::Pmaddubsw, RegMem::reg(src), dst));


Sorry, I don't know anything about the ISA... but since you're using 'src' here, shouldn't you be checking that the input to the swiden_low and swiden_high is the same?

+1, this needs to be part of the condition above (where we check input opcodes) such that we fall back to generic codegen otherwise, as it's perfectly legal to pairwise-add extends of two different values.

@sparker-arm @cfallin .. This is exactly what I was thinking about. This is the tension between treating a clif as a 1:1 wasm instruction mapping versus the decomposition approach. I didn't add the check because as noted, it is perfect legal for swiden_low and swiden_high to have to not be the same per the definition of iadd_pairwise. Here, we are not lowering ext_addpairwise which would require the condition that the input be the same ... we are lowering iadd_pairwise. I was concerned that some exercise or fuzz testing or something lowering just this cliff instruction could be invalidated incorrectly. I can add the check as I want this to merge, plus I only look at one operand so I think it is correct to add, but I do think we need to think about or even audit the consistency of similar checks and whether they map to the clif instruction that we are actually lowering or the wasm instruction that we are intending to lower.

Here, we are not lowering ext_addpairwise which would require the condition that the input be the same ... we are lowering iadd_pairwise

Well, that's true in the outer scope, but the instructions that this case is emitting are specifically for the same-input case, no?

IMHO it's fine for now to fall back to unimplemented!() if we know for sure that we won't generate an iadd_pairwise with other (arbitrary) inputs, but at some point we should fill in that case too, for completeness.

should audit ...

I agree! I have lots of thoughts on verification that this text box is too small to contain :-) IMHO this is the next big effort after getting current projects (including some sort of isel DSL) landed.

cfallin

Thanks @jlb6740 -- we're really close now! A few comments below.

build.rs

cranelift/codegen/src/isa/x64/lower.rs

cfallin · 2021-07-30T16:30:25Z

cranelift/codegen/src/isa/x64/lower.rs

+                                RegMem::reg(mul_const_reg.to_reg()),
+                                dst,
+                            ));
+                            ctx.emit(Inst::xmm_rm_r(SseOpcode::Pmaddubsw, RegMem::reg(src), dst));


+1, this needs to be part of the condition above (where we check input opcodes) such that we fall back to generic codegen otherwise, as it's perfectly legal to pairwise-add extends of two different values.

cranelift/codegen/src/isa/x64/lower.rs

cfallin · 2021-07-30T16:32:03Z

cranelift/interpreter/src/step.rs

@@ -630,6 +630,8 @@ where
        Opcode::Fence => unimplemented!("Fence"),
        Opcode::WideningPairwiseDotProductS => unimplemented!("WideningPairwiseDotProductS"),
        Opcode::SqmulRoundSat => unimplemented!("SqmulRoundSat"),
+        Opcode::ExtendedPairwiseAddSigned => unimplemented!("ExtendedPairwiseAddSigned"),


Are these still the old opcodes?

Glad you caught this. Yes, was going to do one more once through but may have missed this. Surprised it doesn't warn?

Me too; @abrown I guess we're not building the CLIF interpreter by default in any CI target?

I thought we were: there are test interpret filetests that should use it and @afonso360 has a fuzz target that should use it.

jlb6740 · 2021-07-30T19:59:13Z

I personally wouldn't rely only on the conformance test suite, we've already had problems with it. Pretty sure you've got some bugs here and I guess the existing test suite isn't exercising them.

Hi @sparker-arm .. I was planning to check/compare the lowering after comments from this first update, but do you see something specifically in case I miss it? I agree, the spec tests have shown to sometimes be cut-n-paste exercises that aren't thoughtful for the specific instructions. Not sure the best way to provide an answer key though for this lowering sequence. The lowering is not derived by me, but it's based on the discussion during instruction proposal and cross checked against what v8 appears to be doing, and finally of course spec tests which is there to catch errors though it is not always successful.

jlb6740 · 2021-07-31T02:18:09Z

This is intended to be ready. If there is anything else let me know.

cfallin

Thanks -- almost there! A comment below on the assert, though -- I think it needs to be slightly different. Once that's updated I'm happy to see this merged.

cranelift/codegen/meta/src/shared/instructions.rs

cranelift/codegen/src/isa/x64/lower.rs

cfallin

LGTM! Very excited to have the last bit of Wasm-SIMD on x64 in -- thanks @jlb6740 and @abrown and great work!

jlb6740 force-pushed the extend-add-pairwise-x64 branch from 2f4c653 to 4c97443 Compare June 24, 2021 23:17

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm labels Jun 24, 2021

jlb6740 force-pushed the extend-add-pairwise-x64 branch 10 times, most recently from f4d1c26 to 9eaab93 Compare June 30, 2021 06:16

akirilov-arm reviewed Jun 30, 2021

View reviewed changes

abrown mentioned this pull request Jun 30, 2021

clif: pattern-match to remove specialized SIMD instructions #3045

Closed

jlb6740 force-pushed the extend-add-pairwise-x64 branch 2 times, most recently from 577c582 to 57ee118 Compare July 7, 2021 06:34

jlb6740 force-pushed the extend-add-pairwise-x64 branch 4 times, most recently from 495e570 to 5b1d23d Compare July 10, 2021 01:13

jlb6740 mentioned this pull request Jul 29, 2021

Bump to Wasmtime v0.29.0 and Cranelift 0.76.0. #3123

Merged

jlb6740 force-pushed the extend-add-pairwise-x64 branch 2 times, most recently from 2f1ece9 to b9f9d1d Compare July 30, 2021 00:45

jlb6740 requested review from akirilov-arm and cfallin July 30, 2021 04:09

sparker-arm reviewed Jul 30, 2021

View reviewed changes

cfallin reviewed Jul 30, 2021

View reviewed changes

Add extend-add-pairwise instructions x64

e373ddf

jlb6740 force-pushed the extend-add-pairwise-x64 branch 7 times, most recently from b1dc255 to 6378d1b Compare July 31, 2021 01:40

jlb6740 requested a review from cfallin July 31, 2021 02:17

cfallin reviewed Jul 31, 2021

View reviewed changes

cranelift/codegen/meta/src/shared/instructions.rs Outdated Show resolved Hide resolved

cranelift/codegen/src/isa/x64/lower.rs Outdated Show resolved Hide resolved

Refactor and turn on lowering for extend-add-pairwise

e519fca

jlb6740 force-pushed the extend-add-pairwise-x64 branch 2 times, most recently from c383f2f to e519fca Compare July 31, 2021 18:39

jlb6740 requested review from cfallin and abrown July 31, 2021 19:12

cfallin approved these changes Aug 1, 2021

View reviewed changes

cfallin merged commit 87fefd8 into bytecodealliance:main Aug 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extend-add-pairwise instructions x64 #3031

Add extend-add-pairwise instructions x64 #3031

jlb6740 commented Jun 24, 2021

akirilov-arm Jun 30, 2021 •

edited

Loading

abrown Jun 30, 2021

akirilov-arm Jun 30, 2021

abrown Jun 30, 2021

cfallin Jun 30, 2021

jlb6740 Jul 9, 2021 •

edited

Loading

akirilov-arm Jul 13, 2021

sparker-arm Jul 13, 2021

jlb6740 Jul 14, 2021 •

edited

Loading

sparker-arm Jul 14, 2021

jlb6740 commented Jul 30, 2021 •

edited

Loading

sparker-arm left a comment

sparker-arm Jul 30, 2021

sparker-arm Jul 30, 2021

cfallin Jul 30, 2021

jlb6740 Jul 30, 2021 •

edited

Loading

cfallin Jul 30, 2021

cfallin left a comment

cfallin Jul 30, 2021

cfallin Jul 30, 2021

jlb6740 Jul 30, 2021

cfallin Jul 30, 2021

abrown Jul 30, 2021 •

edited

Loading

jlb6740 commented Jul 30, 2021 •

edited

Loading

jlb6740 commented Jul 31, 2021

cfallin left a comment

cfallin left a comment •

edited

Loading

Add extend-add-pairwise instructions x64 #3031

Add extend-add-pairwise instructions x64 #3031

Conversation

jlb6740 commented Jun 24, 2021

akirilov-arm Jun 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlb6740 Jul 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlb6740 Jul 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlb6740 commented Jul 30, 2021 • edited Loading

sparker-arm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlb6740 Jul 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfallin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrown Jul 30, 2021 • edited Loading

Choose a reason for hiding this comment

jlb6740 commented Jul 30, 2021 • edited Loading

jlb6740 commented Jul 31, 2021

cfallin left a comment

Choose a reason for hiding this comment

cfallin left a comment • edited Loading

Choose a reason for hiding this comment

akirilov-arm Jun 30, 2021 •

edited

Loading

jlb6740 Jul 9, 2021 •

edited

Loading

jlb6740 Jul 14, 2021 •

edited

Loading

jlb6740 commented Jul 30, 2021 •

edited

Loading

jlb6740 Jul 30, 2021 •

edited

Loading

abrown Jul 30, 2021 •

edited

Loading

jlb6740 commented Jul 30, 2021 •

edited

Loading

cfallin left a comment •

edited

Loading