Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise long multiply + add/sub/neg on arm64. #91886

Merged
merged 3 commits into from
Sep 20, 2023
Merged

Conversation

c272
Copy link
Contributor

@c272 c272 commented Sep 11, 2023

This PR optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single [s/u]maddl, [s/u]subl or smnegl instruction. It also adds a generic ternary operator which can be used in the future to support other three argument nodes.

The spmidiff results for this patch can be found at the below gist (performed on win-arm64 since linux-arm64 is currently broken as per #91257):
https://gist.github.com/c272/944721d2d083660e0ba7ec9fddb24de4

Minor regressions seen here are due to additional mov instructions changing the register allocation in cases where the add value is containable, preventing some previously performed ldp optimisations. Eliminating all cases of containable adds also removes many valid optimisations, so I have currently left it as is.

Contributes to #68028

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 11, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 11, 2023
@ghost
Copy link

ghost commented Sep 11, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single [s/u]maddl, [s/u]subl or smnegl instruction. It also adds a generic ternary operator which can be used in the future to support other three argument nodes.

The spmidiff results for this patch can be found at the below gist (performed on win-arm64 since linux-arm64 is currently broken as per #91257):
https://gist.github.com/c272/944721d2d083660e0ba7ec9fddb24de4

Minor regressions seen here are due to additional mov instructions changing the register allocation in cases where the add value is containable, preventing some previously performed ldp optimisations. Eliminating all cases of containable adds also removes many valid optimisations, so I have currently left it as is.

Author: c272
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@kunalspathak
Copy link
Member

In the diffs, I see smull+add is combined into umaddl instead of smaddl. Also, can you please fix the build errors?

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems adding GenTreeTernOp can be avoided. In the past, we directly recognized the pattern in codegen e.g. madd, msub. Did you try that approach and faced any problem?

@kunalspathak
Copy link
Member

@dotnet/jit-contrib @TIHan

@a74nh
Copy link
Contributor

a74nh commented Sep 12, 2023

It seems adding GenTreeTernOp can be avoided. In the past, we directly recognized the pattern in codegen e.g. madd, msub. Did you try that approach and faced any problem?

I guess that would be possible, but it feels like it should be done during lowering (I have no overly strong opinion though).

My thought for GenTreeTernOp is that a follow on patch could move SELECT, CMPXCHG over to use it, which should simplify things a little.

@tannergooding
Copy link
Member

Rather than GenTreeTernOp, we already have GenTreeHWIntrinsic which supports multi-op and which is really oriented around these types of specialized instructions for both SIMD and Scalar code.

We can just introduce an ArmBase_LongMultiplyAdd or similar for internal use only and have lowering generate that node. The existing metadata table will make this very simple to support and likely require less overall code changes.

@c272
Copy link
Contributor Author

c272 commented Sep 13, 2023

@tannergooding Thank you for the comment, I have reworked the patch to utilise GT_HWINTRINSIC instead of creating a new ternary node. The updated patch should also resolve the testing failures.

Updated spmidiff results are here (re-run on win-arm64):
https://gist.github.com/c272/5f7275fdff6b829cdc6d002b29f5bb9e

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to resolve merge conflict and also fix the test failure:

Assert failure(PID 275 [0x00000113], Thread: 275 [0x0113]): Assertion failed 'compiler->compIsaSupportedDebugOnly(HWIntrinsicInfo::lookupIsa(intrin.id))' in 'System.Collections.Generic.Dictionary`2[System.__Canon,System.__Canon]:TryInsert(System.__Canon,System.__Canon,ubyte):ubyte:this' during 'Generate code' (IL size 668; hash 0x26ab038f; FullOpts)

    File: /__w/1/s/src/coreclr/jit/hwintrinsiccodegenarm64.cpp Line: 210
    Image: /root/helix/work/correlation/corerun

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 17, 2023
This patch optimises an extending multiply of two integers
out to a long followed by an addition, subtraction or negation
operation down to a single [s/u]maddl, [s/u]subl or smnegl
instruction, represented as a GT_HWINTRINSIC.
@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 19, 2023
Change-Id: Id0befd1a643620a1f4e0dcb0298f70cee296445d
Change-Id: Iffcd54d6f2a3a25474e31f3a3cbf4072a8efcff8
Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. It should get sign off from someone from @dotnet/jit-contrib before being merged

@c272 c272 requested a review from kunalspathak September 20, 2023 12:20
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants