-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize FMA codegen base on the overwritten #58196
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis is for #12984. @kunalspathak @tannergooding, thanks!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. Did you run the superpmi asmdiff?
No, I haven't. What is this for? |
Pretty much all Jit changes are run through diffs (and SPMI is probably the most convenient tool for getting them), so that we can asses the impact on the generated code and how much existing test coverage we have. |
@SingleAccretion, I ran the SuperPMI.py with asmdiffs and saw quite some errors, most of which are from "JIT.HardwareIntrinsics.Arm.Helpers:FPRSqrtStepFused(float,float):float" or other similar tests. How can I find these tests to debug? And, it's very confusing that my change is suppose to only work for xarch, why Arm tests are complaining. |
@weilinwa One of the nicest things with SPMI is that it makes debugging easy. When you encountered errors (I presume asserts), the tool should've printed a "reproduction command", with the path to the native SPMI executable and a list of
I am not sure why that is either. |
@SingleAccretion , I got the "Error: no baseline JIT found" when run the |
Yes. I believe we only have prebuilt Jits for the |
Correct way to use this is:
This will do |
@kunalspathak @tannergooding, I've modified the code logic to check different Asm diffsbenchmarks.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
|
@tannergooding, I have a question about In instructions for FMA of scalar values like My questions is, do we need to ensure |
@weilinwa, that's a great question. The TL;DR; is that yes we do need to ensure Normally we provide two versions of the scalar function where this matters, such as:
When we do this, the upper bits come from For We'd need to expose |
|
||
srcCount += 1; | ||
srcCount += BuildDelayFreeUses(emitOp2, emitOp1); | ||
srcCount += emitOp3->isContained() ? BuildOperandUses(emitOp3) : BuildDelayFreeUses(emitOp3, emitOp1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot smaller and easier to follow now 🎉
src/coreclr/jit/lsraxarch.cpp
Outdated
if (containedOpNum == 1 && !copiesUpperBits) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the !copiesUpperBits
check needed? If we are copiesUpperBits
then containedOpNum
shouldn't be 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be true. Do we need to add an assert before to ensure that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you already have that assert a few lines up: https://github.com/dotnet/runtime/pull/58196/files/5ca658ebf53beee4d84446236f9391fba9e3e935#diff-9626112837daf480c93d401a587012b3e398dd90a953c870797d472cff36839dR2342
assert(!copiesUpperBits || !op1->isContained());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe replace it with assert(containedOpNum != 1 || !copiesUpperBits);
to also cover the regOptional
case
src/coreclr/jit/lsraxarch.cpp
Outdated
// Intrinsics with CopyUpperBits semantics must have op1 as target | ||
if (containedOpNum == 1 && !copiesUpperBits) | ||
{ | ||
if (resultOpNum != 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this were the positive case and there were an assert that resultOpNum != containedOpNum
Therefore, if containedOpNum == 1
then resultOpNum
can only be 0
, 2
, or 3
If it's 3
, then swapping op1
/op3
is sufficient
If it's 2
, then swapping op2
/op3
is needed first
If it's 0
, then it doesn't matter what we do so its fine to not swap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that containedOpNum ==0
and resultOpNum==0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If none of the operands are overwritten and none are last use, then containedOpNum == 0
.
I think we probably won't also get containedOpNum == 0
because VFMADD
should support general-purpose loads as well and so RegOptional
should probably be true for at least one case. But in general its better to check and account for possible future changes, scenarios, or nodes that are introduced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because op lastUse
could be updated after lowering, there are cases that we have resultOpNum == containedOpNum
when they are not 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make multiple ops contained in lowering or change that in lsra?
src/coreclr/jit/lsraxarch.cpp
Outdated
} | ||
else | ||
{ | ||
assert(containedOpNum == 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its possible for containedOpNum
to be 0
and so we should check for this explicitly.
|
||
srcCount += op3->isContained() ? BuildOperandUses(op3) : BuildDelayFreeUses(op3, op1); | ||
if (resultOpNum == 3 && !copiesUpperBits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just capturing a comment, I don't think we need to do anything in this PR.
I think the logic around copiesUpperBits
could be simplified a bit so we don't need these extra checks everywhere. That is, if copiesUpperBits
is true, then resultOpNum
doesn't matter if its not 1
so maybe we should be forcing resultOpNum
to be 0
in that case (that is if copiesUpperBits == true
and resultOpNum != 1
, then treat it as 0
, because no matter what we do, op1
cannot be swapped or moved about and op2
/op3
will be delay free or contained).
// op1 = (op1 * op2) + [op3] or op2 = (op1 * op2) + [op3] | ||
// ? = (op1 * op2) + [op3] or ? = (op1 * op2) + op3 | ||
// 213 form: XMM1 = (XMM2 * XMM1) + [XMM3] | ||
isCommutative = copiesUpperBits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't isCommutative
be !copiesUpperBits
? We can't swap anything if copiesUpperBits == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I used it inaccurately here to barely control if we should enter the branch.
} | ||
|
||
regNumber op1Reg = emitOp1->GetRegNum(); | ||
regNumber op2Reg = emitOp2->GetRegNum(); | ||
|
||
if (isCommutative && (op1Reg != targetReg) && (op2Reg == targetReg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this block still needed given the above handling?
It feels like we should already be covering this under the last block, which is op3
or nothing
is contained/spilled so:
if (!copiesUpperBits && (targetReg == op2Reg))
{
std::swap(emitOp1, emitOp2);
}
Then everything should be in the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it if (!copiesUpperBits && (targetReg == op2Reg))
not if (copiesUpperBits && (targetReg == op2Reg))
? I thought we need to ensure targetReg
is op1Reg
only when copiesUpperBits
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because emitOp1
is already op1
, so if copiesUpperBits == true
, then we don't want to change anything.
When its false, we only need to swap if the target reg is op2Reg.
@tannergooding, could you please take a look at the latest code when you have time? I resolved almost all of your comments except the resultOpNum and containedOpNum assertion. Thanks! |
op1Reg = op3->GetRegNum(); | ||
op2Reg = op2->GetRegNum(); | ||
op3 = op1; | ||
if (targetReg == op3NodeReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be !copiesUpperBits && (targetReg == op3NodeReg)
Otherwise, copiesUpperBits
can be true since op1
is not Contained
or UsedFromSpillTemp
and therefore swapping emitOp1
isn't correct.
// op1 = (op1 * op2) + [op3] or op2 = (op1 * op2) + [op3] | ||
// ? = (op1 * op2) + [op3] or ? = (op1 * op2) + op3 | ||
// 213 form: XMM1 = (XMM2 * XMM1) + [XMM3] | ||
if (targetReg == op2NodeReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, I think this needs to be if (!copiesUpperBits && (targetReg == op2NodeReg))
for the same reason.
I think we also don't need the below section doing if (!copiesUpperBits && (emitOp2->GetRegNum() == targetReg))
as it will have already been covered up here.
Everything looks good except for the two related callouts in codegen. Looks like there is also a merge conflict, like due to #59912. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all LGTM. CC. @kunalspathak or @echesakovMSFT could you give a second review and merge if everything looks good to you as well
@kunalspathak @echesakovMSFT, could you take a look when you have some time? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to uncomment the 2 asserts and run the test to make sure they are not hit.
if (containedOpNum == 1) | ||
{ | ||
// resultOpNum might change between lowering and lsra, comment out assertion for now. | ||
// assert(containedOpNum != resultOpNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to uncomment this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion cannot be uncommented because the last use value could change after lowering step. I left them here for follow up work if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create a issue for it and add the link to the issue in the comment here?
} | ||
else if (containedOpNum == 3) | ||
{ | ||
// assert(containedOpNum != resultOpNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here?
Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @weilinwa for your patience and commitment. This looks good to me.
@weilinwa - I noticed superpmi.py replay failure on linux/x64. Can you double check if it is from your change?
|
This is for #12984. @kunalspathak @tannergooding, thanks!