-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64 - Always morph GT_MOD #68885
ARM64 - Always morph GT_MOD #68885
Conversation
…y lowered GT_MOD.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsShould resolve these regressions:
Description Currently a draft. We only want to do the specific ARM64 mod optimization if the morphed The idea here is to find the But it gets more complicated. Acceptance Criteria
|
…s constant non-zero value
@kunalspathak - I think this is handling those regressions we saw earlier with the use of Looking at the asmdiffs now, it does show a lot of regressions, but they are caused by the additional divbyzero and overflow check: G_M60555_IG04: ; gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
bl CORINFO_HELP_OVERFLOW
;; size=4 bbWeight=0 PerfScore 0.00
G_M60555_IG05: ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
bl CORINFO_HELP_THROWDIVZERO
brk_windows #0
;; size=8 bbWeight=0 PerfScore 0.00 These were here before we merged |
I think recognizing patterns this large in lowering is too fragile and hard to get right. |
It might also be reasonable to introduce additional transformations as part of rationalization (or separately, with higher TP cost). We also have #68103 where it would be good to have. Thoughts @dotnet/jit-contrib? |
I definitely agree with this. I don't know if there is a better option (why can't it be done in morph?), though. |
I agree as well. Originally, I only wanted to find the This specific optimization shouldn't be done in morph, even if we had the constructs to do so. The reason why is because the transformation of let x = index / 16
let y = index % 16 which would turn into: let x = index / 16
let y = index - (index / 16) * 16 then: let cse = index / 16
let y = index - cse * 16 At the moment, we are not taking advantage of CSE because we merged in the What I really want to do is look for |
I would be interested to see a prototype of the suggestions I had above. I.e. try it as part of rationalization (in pre-order there), and try a new late phase (e.g. after range check). We can see how costly the full IR walk will be and evaluate whether having the optimization and future opportunity to do other things in that pass outweighs that cost. |
Note that we already have an "simple lowering" pass in our phase order that is a (mostly "empty") full IR walk. |
It's after rationalization however, so folding transformations still require interference checking here (or otherwise coupling it to rationalization). |
Yep, the idea would be to move it to before rationalization. Tree walks are still costlier than linear traversals, but not "a new full IR walk" costlier at least. |
i % 2
src/coreclr/jit/lowerarmarch.cpp
Outdated
@@ -104,6 +104,7 @@ bool Lowering::IsContainableImmed(GenTree* parentNode, GenTree* childNode) const | |||
case GT_LE: | |||
case GT_GE: | |||
case GT_GT: | |||
case GT_CMP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to add this so that GT_CMP
's second operand can be properly contained.
@dotnet/jit-contrib this is ready again, CI is green now |
I'd suggest to split the new optimization into a separate PR so we can review it and track its impact separately. |
This reverts commit 80635b0.
i % 2
@jakobbotsch I split them out |
|
||
return cc->gtNext; | ||
ContainCheckNode(mod); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes necessary or should they be done in the new PR too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of? I got rid of the use
. New PR extends this path anyway with the new instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is unrelated to the fix, right?
I am fine with leaving it to avoid more churn on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unrelated, been trying different ways to shape this function. The next PR is the shape and impl I'm happy with.
Should resolve these regressions:
Assertion failed '!"Shouldn't see an integer typed GT_MOD node in ARM64"' during 'Linear scan register alloc' #68470Seems to not reproduce in main anymore.Description
We only want to do the specific ARM64 mod optimization if the morphed
Mod
toSubMulDiv
did not take advantage of CSE. The only phase, that I know of, to do any kind of transformation that occurs after CSE is 'lowering'.The idea here is to find the
SubMulDiv
in lowering, turn it back into aMod
, and then callLowerModPow2
to do the specific optimization.But it gets more complicated.
SubMulDiv
itself gets optimized in lowering as well, so the shape ofSubMulDiv
is lost. Therefore, we need to find other possible shapes that get lowered fromSubMulDiv
.Acceptance Criteria