-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise long multiply + add/sub/neg on arm64. #91886
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis PR optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single The Minor regressions seen here are due to additional
|
In the diffs, I see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems adding GenTreeTernOp
can be avoided. In the past, we directly recognized the pattern in codegen e.g. madd
, msub
. Did you try that approach and faced any problem?
@dotnet/jit-contrib @TIHan |
I guess that would be possible, but it feels like it should be done during lowering (I have no overly strong opinion though). My thought for |
Rather than We can just introduce an |
@tannergooding Thank you for the comment, I have reworked the patch to utilise Updated spmidiff results are here (re-run on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to resolve merge conflict and also fix the test failure:
Assert failure(PID 275 [0x00000113], Thread: 275 [0x0113]): Assertion failed 'compiler->compIsaSupportedDebugOnly(HWIntrinsicInfo::lookupIsa(intrin.id))' in 'System.Collections.Generic.Dictionary`2[System.__Canon,System.__Canon]:TryInsert(System.__Canon,System.__Canon,ubyte):ubyte:this' during 'Generate code' (IL size 668; hash 0x26ab038f; FullOpts)
File: /__w/1/s/src/coreclr/jit/hwintrinsiccodegenarm64.cpp Line: 210
Image: /root/helix/work/correlation/corerun
This patch optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single [s/u]maddl, [s/u]subl or smnegl instruction, represented as a GT_HWINTRINSIC.
Change-Id: Id0befd1a643620a1f4e0dcb0298f70cee296445d
Change-Id: Iffcd54d6f2a3a25474e31f3a3cbf4072a8efcff8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct to me. It should get sign off from someone from @dotnet/jit-contrib before being merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single
[s/u]maddl
,[s/u]subl
orsmnegl
instruction. It also adds a generic ternary operator which can be used in the future to support other three argument nodes.The
spmidiff
results for this patch can be found at the below gist (performed onwin-arm64
sincelinux-arm64
is currently broken as per #91257):https://gist.github.com/c272/944721d2d083660e0ba7ec9fddb24de4
Minor regressions seen here are due to additional
mov
instructions changing the register allocation in cases where the add value is containable, preventing some previously performedldp
optimisations. Eliminating all cases of containable adds also removes many valid optimisations, so I have currently left it as is.Contributes to #68028