-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extend-add-pairwise instructions x64 #3031
Add extend-add-pairwise instructions x64 #3031
Conversation
2f4c653
to
4c97443
Compare
f4d1c26
to
9eaab93
Compare
|
||
ig.push( | ||
Inst::new( | ||
"extended_pairwise_add_signed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative approach is to introduce a binary pairwise addition IR operation, e.g. iadd_pairwise
, which sounds like a more useful primitive operation - it could be used both by other existing Wasm instructions like i32x4.dot_i16x8_s
and possible future additions (some of which have already been discussed - see here for an example); in that case extended_pairwise_add_signed
would become iadd_pairwise(swiden_low, swiden_high)
. We need to do some pattern-matching for optimal AArch64 code generation (which is a single instruction), but I am not sure about other architectures - any thoughts from the people working on the backends?
P.S. Actually even considering the extended pairwise additions in isolation - the iadd_pairwise
way introduces only one IR operation, while the approach in this PR adds two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general comment: in the past we were implicitly following the "make a new CLIF instruction for each Wasm SIMD instruction" paradigm so a lot of existing CLIF instructions could be split up in the same way you describe above. I think this is a thing that should be done (it just seems to make sense to simplify CLIF even at the risk of a few extra cycles during lowering) but I don't feel very strongly about it so I would be fine with either:
- we do the pattern matching now AND we start a review of the existing SIMD instructions in a separate issue to clean up CLIF
- we just start the review of existing SIMD instructions now and eventually remove the unnecessary CLIF instructions and switch to more pattern matching; this would mean following the "make a new CLIF instruction for each Wasm SIMD instruction" for the last few outstanding instructions
In summary, I don't mind if we do iadd_pairwise
now or later, but I do think we need to commit to doing the same to the rest of SIMD CLIF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to link to the discussion in #2982 for context.
@abrown While I don't have any strong feelings either, in #3044 I have already started with the first of your suggested approaches. In fact, the only Wasm SIMD operations that are not already covered by merged code or pull requests are the family of i*.extmul_*_i*
operations, for which the specification text itself provides a simple way to express them in terms of other basic operations.
I agree that opening an issue for CLIF clean-up is definitely one of the steps forward, no matter what we decide to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #3045 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just commented over in #3045 with thoughts on the general question; for this specific case I think it does seem reasonable to split into the pairwise add with extends on the inputs. But let's see what we come to over there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akirilov-arm @cfallin Hi .. so the suggestion here is to remove the new IR instructions extended_pairwise_add_signed and extended_pairwise_add_unsigned and instead create one called iadd_pairwise. When lowering .. we would keep the lowering code that is there now, but pattern match for the instruction and type during lowering in order to get the input, or is the suggestion to scrape that lowering code too, actually lower swiden_low and swiden_high where we then are using the output of those particular instructions to be input to iadd_pairwise which needs to then be a new implementation from what is there now? I look at the change made in #2982 and it really is just avoiding creating another IR instruction while maybe adding a little logic of doing the matching, but keeps the original sequence. Here I am not sure if the suggestion is to do that or go further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the handling should be equivalent to the one for the fcvt_from_uint
operation in #2982 - that is, there should be a code path that handles iadd_pairwise
irrespective of its inputs, to serve as a fallback. Then right before that path there should be a check if the inputs match the pattern; if they do, execution should proceed through your current lowering code for extended_pairwise_add_*
.
BTW @sparker-arm already has the common iadd_pairwise
implementation and the AArch64-specific bits, but he is waiting for this pull request to get merged, so that you do not step on each other's toes. In the meantime, he may be able to offer some advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't mind posting this up if people are interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sparker-arm .. can you post here?@akirilov-arm I still am not sure I understand what changes are in mind. Currently in code_translators there is this new instruction extended_pairwise_add_signed called here:
Operator::I16x8ExtAddPairwiseI8x16S => { let a = pop1_with_bitcast(state, I8X16, builder); state.push1(builder.ins().extended_pairwise_add_signed(a)) }
and new instruction extended_pairwise_add_unsigned called here:
Operator::I16x8ExtAddPairwiseI8x16U => { let a = pop1_with_bitcast(state, I8X16, builder); state.push1(builder.ins().extended_pairwise_add_unsigned(a)) }
You are saying to instead just call one instruction for both (iadd_pairwise) and then lower from there? I think this is what you are saying but if you are, how do we know if the instruction started as signed or unsigned? Also should that be extended_iadd_pairwise instead of iadd_pairwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll post the whole thing up later today. But just to point out that we'll know the signedness because of the [u|s]widen inputs, for example:
Operator::I16x8ExtAddPairwiseI8x16S => {
let a = pop1_with_bitcast(state, I8X16, builder);
let widen_low = builder.ins().swiden_low(a);
let widen_high = builder.ins().swiden_high(a);
state.push1(builder.ins().iadd_pairwise(widen_low, widen_high));
}
This creates a bit more work in the AArch64 backend, but it does also mean that we can fallback to an ADDP when we don't successfully match the extending inputs. As a side queston, unless I'm mistaken wasm doesn't have a horizontal add - does anyone know why?
577c582
to
57ee118
Compare
495e570
to
5b1d23d
Compare
2f1ece9
to
b9f9d1d
Compare
This is a refactoring based on earlier comments. The lowering remains intact it is just that now the clif instruction operands are more instructions. Matching in lowering is now different to properly match on the WASM instruction we are intending to lower. The CI failures appear to be from an unrelated issue. Wish I knew how to resolve but I can see a stoppage on a dead code warning? #3123 This should be the last PR for full x64 Wasm SIMD support. Of course there are some refactoring and clean-up TODOs that should be done as follow-up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally wouldn't rely only on the conformance test suite, we've already had problems with it. Pretty sure you've got some bugs here and I guess the existing test suite isn't exercising them.
Inst::new( | ||
"iadd_pairwise", | ||
r#" | ||
Does Lane-wise integer pairwise addition on two operands, putting the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have a clearer definition of the semantics, this currently doesn't tell me which elements make a pair.
RegMem::reg(mul_const_reg.to_reg()), | ||
dst, | ||
)); | ||
ctx.emit(Inst::xmm_rm_r(SseOpcode::Pmaddubsw, RegMem::reg(src), dst)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't know anything about the ISA... but since you're using 'src' here, shouldn't you be checking that the input to the swiden_low and swiden_high is the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this needs to be part of the condition above (where we check input opcodes) such that we fall back to generic codegen otherwise, as it's perfectly legal to pairwise-add extends of two different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sparker-arm @cfallin .. This is exactly what I was thinking about. This is the tension between treating a clif as a 1:1 wasm instruction mapping versus the decomposition approach. I didn't add the check because as noted, it is perfect legal for swiden_low and swiden_high to have to not be the same per the definition of iadd_pairwise. Here, we are not lowering ext_addpairwise which would require the condition that the input be the same ... we are lowering iadd_pairwise. I was concerned that some exercise or fuzz testing or something lowering just this cliff instruction could be invalidated incorrectly. I can add the check as I want this to merge, plus I only look at one operand so I think it is correct to add, but I do think we need to think about or even audit the consistency of similar checks and whether they map to the clif instruction that we are actually lowering or the wasm instruction that we are intending to lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we are not lowering ext_addpairwise which would require the condition that the input be the same ... we are lowering iadd_pairwise
Well, that's true in the outer scope, but the instructions that this case is emitting are specifically for the same-input case, no?
IMHO it's fine for now to fall back to unimplemented!()
if we know for sure that we won't generate an iadd_pairwise
with other (arbitrary) inputs, but at some point we should fill in that case too, for completeness.
should audit ...
I agree! I have lots of thoughts on verification that this text box is too small to contain :-) IMHO this is the next big effort after getting current projects (including some sort of isel DSL) landed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jlb6740 -- we're really close now! A few comments below.
RegMem::reg(mul_const_reg.to_reg()), | ||
dst, | ||
)); | ||
ctx.emit(Inst::xmm_rm_r(SseOpcode::Pmaddubsw, RegMem::reg(src), dst)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this needs to be part of the condition above (where we check input opcodes) such that we fall back to generic codegen otherwise, as it's perfectly legal to pairwise-add extends of two different values.
cranelift/interpreter/src/step.rs
Outdated
@@ -630,6 +630,8 @@ where | |||
Opcode::Fence => unimplemented!("Fence"), | |||
Opcode::WideningPairwiseDotProductS => unimplemented!("WideningPairwiseDotProductS"), | |||
Opcode::SqmulRoundSat => unimplemented!("SqmulRoundSat"), | |||
Opcode::ExtendedPairwiseAddSigned => unimplemented!("ExtendedPairwiseAddSigned"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these still the old opcodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you caught this. Yes, was going to do one more once through but may have missed this. Surprised it doesn't warn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too; @abrown I guess we're not building the CLIF interpreter by default in any CI target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were: there are test interpret
filetests that should use it and @afonso360 has a fuzz target that should use it.
Hi @sparker-arm .. I was planning to check/compare the lowering after comments from this first update, but do you see something specifically in case I miss it? I agree, the spec tests have shown to sometimes be cut-n-paste exercises that aren't thoughtful for the specific instructions. Not sure the best way to provide an answer key though for this lowering sequence. The lowering is not derived by me, but it's based on the discussion during instruction proposal and cross checked against what v8 appears to be doing, and finally of course spec tests which is there to catch errors though it is not always successful. |
b1dc255
to
6378d1b
Compare
This is intended to be ready. If there is anything else let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- almost there! A comment below on the assert, though -- I think it needs to be slightly different. Once that's updated I'm happy to see this merged.
c383f2f
to
e519fca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.