Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extend-add-pairwise instructions x64 #3031

Merged
merged 2 commits into from
Aug 1, 2021

Conversation

jlb6740
Copy link
Contributor

@jlb6740 jlb6740 commented Jun 24, 2021

No description provided.

@jlb6740 jlb6740 force-pushed the extend-add-pairwise-x64 branch from 2f4c653 to 4c97443 Compare June 24, 2021 23:17
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm labels Jun 24, 2021
@jlb6740 jlb6740 force-pushed the extend-add-pairwise-x64 branch 10 times, most recently from f4d1c26 to 9eaab93 Compare June 30, 2021 06:16

ig.push(
Inst::new(
"extended_pairwise_add_signed",
Copy link
Contributor

@akirilov-arm akirilov-arm Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach is to introduce a binary pairwise addition IR operation, e.g. iadd_pairwise, which sounds like a more useful primitive operation - it could be used both by other existing Wasm instructions like i32x4.dot_i16x8_s and possible future additions (some of which have already been discussed - see here for an example); in that case extended_pairwise_add_signed would become iadd_pairwise(swiden_low, swiden_high). We need to do some pattern-matching for optimal AArch64 code generation (which is a single instruction), but I am not sure about other architectures - any thoughts from the people working on the backends?

cc @abrown @cfallin @uweigand

P.S. Actually even considering the extended pairwise additions in isolation - the iadd_pairwise way introduces only one IR operation, while the approach in this PR adds two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment: in the past we were implicitly following the "make a new CLIF instruction for each Wasm SIMD instruction" paradigm so a lot of existing CLIF instructions could be split up in the same way you describe above. I think this is a thing that should be done (it just seems to make sense to simplify CLIF even at the risk of a few extra cycles during lowering) but I don't feel very strongly about it so I would be fine with either:

  • we do the pattern matching now AND we start a review of the existing SIMD instructions in a separate issue to clean up CLIF
  • we just start the review of existing SIMD instructions now and eventually remove the unnecessary CLIF instructions and switch to more pattern matching; this would mean following the "make a new CLIF instruction for each Wasm SIMD instruction" for the last few outstanding instructions

In summary, I don't mind if we do iadd_pairwise now or later, but I do think we need to commit to doing the same to the rest of SIMD CLIF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to link to the discussion in #2982 for context.

@abrown While I don't have any strong feelings either, in #3044 I have already started with the first of your suggested approaches. In fact, the only Wasm SIMD operations that are not already covered by merged code or pull requests are the family of i*.extmul_*_i* operations, for which the specification text itself provides a simple way to express them in terms of other basic operations.

I agree that opening an issue for CLIF clean-up is definitely one of the steps forward, no matter what we decide to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #3045 for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commented over in #3045 with thoughts on the general question; for this specific case I think it does seem reasonable to split into the pairwise add with extends on the inputs. But let's see what we come to over there!

Copy link
Contributor Author

@jlb6740 jlb6740 Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akirilov-arm @cfallin Hi .. so the suggestion here is to remove the new IR instructions extended_pairwise_add_signed and extended_pairwise_add_unsigned and instead create one called iadd_pairwise. When lowering .. we would keep the lowering code that is there now, but pattern match for the instruction and type during lowering in order to get the input, or is the suggestion to scrape that lowering code too, actually lower swiden_low and swiden_high where we then are using the output of those particular instructions to be input to iadd_pairwise which needs to then be a new implementation from what is there now? I look at the change made in #2982 and it really is just avoiding creating another IR instruction while maybe adding a little logic of doing the matching, but keeps the original sequence. Here I am not sure if the suggestion is to do that or go further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the handling should be equivalent to the one for the fcvt_from_uint operation in #2982 - that is, there should be a code path that handles iadd_pairwise irrespective of its inputs, to serve as a fallback. Then right before that path there should be a check if the inputs match the pattern; if they do, execution should proceed through your current lowering code for extended_pairwise_add_*.

BTW @sparker-arm already has the common iadd_pairwise implementation and the AArch64-specific bits, but he is waiting for this pull request to get merged, so that you do not step on each other's toes. In the meantime, he may be able to offer some advice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't mind posting this up if people are interested.

Copy link
Contributor Author

@jlb6740 jlb6740 Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sparker-arm .. can you post here?@akirilov-arm I still am not sure I understand what changes are in mind. Currently in code_translators there is this new instruction extended_pairwise_add_signed called here:

    Operator::I16x8ExtAddPairwiseI8x16S => {
        let a = pop1_with_bitcast(state, I8X16, builder);
        state.push1(builder.ins().extended_pairwise_add_signed(a))
    }

and new instruction extended_pairwise_add_unsigned called here:

    Operator::I16x8ExtAddPairwiseI8x16U => {
        let a = pop1_with_bitcast(state, I8X16, builder);
        state.push1(builder.ins().extended_pairwise_add_unsigned(a))
    }

You are saying to instead just call one instruction for both (iadd_pairwise) and then lower from there? I think this is what you are saying but if you are, how do we know if the instruction started as signed or unsigned? Also should that be extended_iadd_pairwise instead of iadd_pairwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll post the whole thing up later today. But just to point out that we'll know the signedness because of the [u|s]widen inputs, for example:

        Operator::I16x8ExtAddPairwiseI8x16S => {
            let a = pop1_with_bitcast(state, I8X16, builder);
            let widen_low = builder.ins().swiden_low(a);
            let widen_high = builder.ins().swiden_high(a);
            state.push1(builder.ins().iadd_pairwise(widen_low, widen_high));
        }

This creates a bit more work in the AArch64 backend, but it does also mean that we can fallback to an ADDP when we don't successfully match the extending inputs. As a side queston, unless I'm mistaken wasm doesn't have a horizontal add - does anyone know why?

@jlb6740 jlb6740 force-pushed the extend-add-pairwise-x64 branch 2 times, most recently from 577c582 to 57ee118 Compare July 7, 2021 06:34
@jlb6740 jlb6740 force-pushed the extend-add-pairwise-x64 branch 4 times, most recently from 495e570 to 5b1d23d Compare July 10, 2021 01:13
@jlb6740 jlb6740 force-pushed the extend-add-pairwise-x64 branch 2 times, most recently from 2f1ece9 to b9f9d1d Compare July 30, 2021 00:45
@jlb6740 jlb6740 requested review from akirilov-arm and cfallin July 30, 2021 04:09
@jlb6740
Copy link
Contributor Author

jlb6740 commented Jul 30, 2021

This is a refactoring based on earlier comments. The lowering remains intact it is just that now the clif instruction operands are more instructions. Matching in lowering is now different to properly match on the WASM instruction we are intending to lower.

The CI failures appear to be from an unrelated issue. Wish I knew how to resolve but I can see a stoppage on a dead code warning?

#3123 This should be the last PR for full x64 Wasm SIMD support. Of course there are some refactoring and clean-up TODOs that should be done as follow-up.

Copy link
Contributor

@sparker-arm sparker-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally wouldn't rely only on the conformance test suite, we've already had problems with it. Pretty sure you've got some bugs here and I guess the existing test suite isn't exercising them.

Inst::new(
"iadd_pairwise",
r#"
Does Lane-wise integer pairwise addition on two operands, putting the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have a clearer definition of the semantics, this currently doesn't tell me which elements make a pair.

RegMem::reg(mul_const_reg.to_reg()),
dst,
));
ctx.emit(Inst::xmm_rm_r(SseOpcode::Pmaddubsw, RegMem::reg(src), dst));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know anything about the ISA... but since you're using 'src' here, shouldn't you be checking that the input to the swiden_low and swiden_high is the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this needs to be part of the condition above (where we check input opcodes) such that we fall back to generic codegen otherwise, as it's perfectly legal to pairwise-add extends of two different values.

Copy link
Contributor Author

@jlb6740 jlb6740 Jul 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sparker-arm @cfallin .. This is exactly what I was thinking about. This is the tension between treating a clif as a 1:1 wasm instruction mapping versus the decomposition approach. I didn't add the check because as noted, it is perfect legal for swiden_low and swiden_high to have to not be the same per the definition of iadd_pairwise. Here, we are not lowering ext_addpairwise which would require the condition that the input be the same ... we are lowering iadd_pairwise. I was concerned that some exercise or fuzz testing or something lowering just this cliff instruction could be invalidated incorrectly. I can add the check as I want this to merge, plus I only look at one operand so I think it is correct to add, but I do think we need to think about or even audit the consistency of similar checks and whether they map to the clif instruction that we are actually lowering or the wasm instruction that we are intending to lower.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we are not lowering ext_addpairwise which would require the condition that the input be the same ... we are lowering iadd_pairwise

Well, that's true in the outer scope, but the instructions that this case is emitting are specifically for the same-input case, no?

IMHO it's fine for now to fall back to unimplemented!() if we know for sure that we won't generate an iadd_pairwise with other (arbitrary) inputs, but at some point we should fill in that case too, for completeness.

should audit ...

I agree! I have lots of thoughts on verification that this text box is too small to contain :-) IMHO this is the next big effort after getting current projects (including some sort of isel DSL) landed.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jlb6740 -- we're really close now! A few comments below.

build.rs Show resolved Hide resolved
cranelift/codegen/src/isa/x64/lower.rs Outdated Show resolved Hide resolved
RegMem::reg(mul_const_reg.to_reg()),
dst,
));
ctx.emit(Inst::xmm_rm_r(SseOpcode::Pmaddubsw, RegMem::reg(src), dst));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this needs to be part of the condition above (where we check input opcodes) such that we fall back to generic codegen otherwise, as it's perfectly legal to pairwise-add extends of two different values.

cranelift/codegen/src/isa/x64/lower.rs Show resolved Hide resolved
@@ -630,6 +630,8 @@ where
Opcode::Fence => unimplemented!("Fence"),
Opcode::WideningPairwiseDotProductS => unimplemented!("WideningPairwiseDotProductS"),
Opcode::SqmulRoundSat => unimplemented!("SqmulRoundSat"),
Opcode::ExtendedPairwiseAddSigned => unimplemented!("ExtendedPairwiseAddSigned"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these still the old opcodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you caught this. Yes, was going to do one more once through but may have missed this. Surprised it doesn't warn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too; @abrown I guess we're not building the CLIF interpreter by default in any CI target?

Copy link
Contributor

@abrown abrown Jul 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were: there are test interpret filetests that should use it and @afonso360 has a fuzz target that should use it.

@jlb6740
Copy link
Contributor Author

jlb6740 commented Jul 30, 2021

I personally wouldn't rely only on the conformance test suite, we've already had problems with it. Pretty sure you've got some bugs here and I guess the existing test suite isn't exercising them.

Hi @sparker-arm .. I was planning to check/compare the lowering after comments from this first update, but do you see something specifically in case I miss it? I agree, the spec tests have shown to sometimes be cut-n-paste exercises that aren't thoughtful for the specific instructions. Not sure the best way to provide an answer key though for this lowering sequence. The lowering is not derived by me, but it's based on the discussion during instruction proposal and cross checked against what v8 appears to be doing, and finally of course spec tests which is there to catch errors though it is not always successful.

@jlb6740 jlb6740 force-pushed the extend-add-pairwise-x64 branch 7 times, most recently from b1dc255 to 6378d1b Compare July 31, 2021 01:40
@jlb6740 jlb6740 requested a review from cfallin July 31, 2021 02:17
@jlb6740
Copy link
Contributor Author

jlb6740 commented Jul 31, 2021

This is intended to be ready. If there is anything else let me know.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- almost there! A comment below on the assert, though -- I think it needs to be slightly different. Once that's updated I'm happy to see this merged.

cranelift/codegen/meta/src/shared/instructions.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/lower.rs Outdated Show resolved Hide resolved
@jlb6740 jlb6740 force-pushed the extend-add-pairwise-x64 branch 2 times, most recently from c383f2f to e519fca Compare July 31, 2021 18:39
@jlb6740 jlb6740 requested review from cfallin and abrown July 31, 2021 19:12
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Very excited to have the last bit of Wasm-SIMD on x64 in -- thanks @jlb6740 and @abrown and great work!

@cfallin cfallin merged commit 87fefd8 into bytecodealliance:main Aug 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants