Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Add ccmp and enable conditional compares for X64 #110826

Closed
wants to merge 77 commits into from

Conversation

anthonycanino
Copy link
Contributor

Overview


This PR is built on top of #108796.

This PR adds APX new ccmp instruction to the X86 backend, and enables some of the existing if-conversion functionality for X86.

Design

Currently, the if-conversion optimization is hidden behind a flag DOTNET_JitEnableApxIfConv and defaults to 0.

For reference, there is a unique extended evex encoding for ccmp:

image

where SC0 - SC3 encode the condition for ccmp to conditionally execute on (please see SDM Vol 1, Appendix B). If the status codes fail to satisfy the condition encoded by SC0 - SC3, no compare will be performed, and the OF, SF, ZF, and CF flags will be set to the default flag value (DFV) fields of, sf, zf and cf.

Testing

Note: The testing plan for APX work has been discussed in #106557, please refer to that PR for details, only results and comments will be posted in this PR. Results posted below.

Update comments.

Merge the REX2 changes into the original legacy emit path

bug fix: Set REX2.W with correct mask code.

register encoding and prefix emitting logics.

Add REX2 prefix emit logic

bug fixes

Add Stress mode for REX2 encoding and some bug fixes

resolve comments:
1. add assertion check for UD opcodes.
2. add checks for EGPRs.

Add REX2 to emitOutputAM, and let LEA to be REX2 compatible.

Add REX2.X encoding for SIB byte

But fixes: add REX2 prefix on the path in RI where MOV is specially handled.

Enable REX2 encoding for `movups`

fixed bugs in REX2 prefix emitting logic when working with map 1 instructions, and enabled REX2 for POPCNT

legacy map index-er

bug fixes

some clean-up

Adding initial APX unit testing path.

Adding a coredistools dll that has LLVM APX disasm capability.

It must be coppied into a CORE_ROOT manually.

clean up work for REX2

narrow the REX2 scope to `sub` only

some clean up based on the comments.

bug fix

resolve comment
 - SV path is mostly for debugging purposes

Added encoding unit tests for instructions with immediates
Code refactoring: AddX86PrefixIfNeeded.
… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.
Refactor REX2 encoding stress logics.
(this will have side effect that the estimated code will go up and mismatch with actual code size.)
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 18, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 18, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@anthonycanino
Copy link
Contributor Author

anthonycanino commented Dec 18, 2024

1. Emitter unit tests

The left is output from JitDisasm, and right from JitLateDisasm.

image
image

2. Intel SDE testing

Run the suite in SDE with JitEnableApxIfConv off:

base

Run the suite in SDE with JitenableApxIfConv on:

diff

#3. SuperPMI results

Base is with JitEnableApxIfConv off. Diff is with JitenableApxIfConv on.

Diffs are based on 2,623,457 contexts (1,043,127 MinOpts, 1,580,330 FullOpts).

MISSED contexts: 2,983 (0.11%)

Base JIT options: JitBypassApxCheck=1

Diff JIT options: JitBypassApxCheck=1;JitEnableApxIfConv=1

Overall (-165,684 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 42,216,437 -6,257 -9.13%
benchmarks.run.windows.x64.checked.mch 8,860,704 -21,878 -4.99%
benchmarks.run_pgo.windows.x64.checked.mch 35,294,983 -25,089 -8.82%
benchmarks.run_tiered.windows.x64.checked.mch 12,613,813 -20,816 -4.88%
coreclr_tests.run.windows.x64.checked.mch 388,693,881 -10,675 -8.48%
libraries.crossgen2.windows.x64.checked.mch 44,941,069 +1,343 -8.64%
libraries.pmi.windows.x64.checked.mch 60,078,628 -7,782 -9.54%
libraries_tests.run.windows.x64.Release.mch 317,477,436 -54,130 -9.82%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 147,744,535 -20,077 -6.96%
realworld.run.windows.x64.checked.mch 10,242,976 -451 -6.47%
smoke_tests.nativeaot.windows.x64.checked.mch 4,496,305 +128 -6.51%
FullOpts (-165,684 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 23,294,229 -6,257 -9.13%
benchmarks.run.windows.x64.checked.mch 8,860,282 -21,878 -4.99%
benchmarks.run_pgo.windows.x64.checked.mch 20,646,767 -25,089 -8.82%
benchmarks.run_tiered.windows.x64.checked.mch 3,214,524 -20,816 -4.88%
coreclr_tests.run.windows.x64.checked.mch 117,503,708 -10,675 -8.48%
libraries.crossgen2.windows.x64.checked.mch 44,939,354 +1,343 -8.64%
libraries.pmi.windows.x64.checked.mch 59,965,735 -7,782 -9.54%
libraries_tests.run.windows.x64.Release.mch 128,808,465 -54,130 -9.82%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,094,840 -20,077 -6.96%
realworld.run.windows.x64.checked.mch 10,018,142 -451 -6.47%
smoke_tests.nativeaot.windows.x64.checked.mch 4,495,216 +128 -6.51%

@jakobbotsch
Copy link
Member

jakobbotsch commented Dec 19, 2024

Happy to see this.

I have one request. Can you please split this PR into two? A first PR that adds the necessary support to codegen/emit to generate the new instructions. The second PR to enable the existing support for emitting conditional compares in the middle-end/lowering? All the work should be in the former PR, which should be free of diffs, while the latter PR should mostly be unifying and enabling the existing support that exists for arm64.

Also, the title is inaccurate. If-conversion is enabled for x64 already. The use of conditional compares is an orthogonal optimization to if-conversion.

@anthonycanino anthonycanino changed the title [JIT] Add ccmp and enable if-conversion for X86 [JIT] Add ccmp and enable conditional compares for X64 Dec 19, 2024
@anthonycanino
Copy link
Contributor Author

Happy to see this.

I have one request. Can you please split this PR into two? A first PR that adds the necessary support to codegen/emit to generate the new instructions. The second PR to enable the existing support for emitting conditional compares in the middle-end/lowering? All the work should be in the former PR, which should be free of diffs, while the latter PR should mostly be unifying and enabling the existing support that exists for arm64.

Also, the title is inaccurate. If-conversion is enabled for x64 already. The use of conditional compares is an orthogonal optimization to if-conversion.

Will do. I plan to get this PR in a clean build/run shape and then will split.

Major changes include:

1. Adding `ccmp` (and promoted cmp) logic into emitter backend.
2. Hooking `TryLowerAndOrToCCMP` from ARM pathway into Intel pathway.
3. Hooking `optimizeCompareChainCondBlock` from ARM pathway into Intel
   pathway.
@anthonycanino
Copy link
Contributor Author

I have split the PR into two PRs, #110881 and #111072.

Both are currently in draft as they are dependent upon a PR #108796 which we are working to fix some test failures. I will close this PR and we can continue relevant discussion on each of the sub PRs.

@BruceForstall BruceForstall added the apx Related to the Intel Advanced Performance Extensions (APX) label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants