-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8278868: Add x86 vectorization support for Long.bitCount() #6857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back vamsi-parasa! A progress list of the required criteria for merging this PR into |
|
@vamsi-parasa The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
/label hotspot-compiler |
|
@vamsi-parasa |
Webrevs
|
|
This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization. |
|
Please also update copywrite headers of modified files. |
| case Op_PopCountVL: | ||
| if (!UsePopCountInstruction || !VM_Version::supports_avx512_vpopcntdq()) { | ||
| return false; | ||
| } | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case could be combined with case Op_PopCountVI and duplication removed. The check is the same for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code as per your suggestion to avoid duplication in the latest commit...
|
|
||
| @Test // needs to be run in (fast) debug mode | ||
| @Warmup(10000) | ||
| @IR(counts = {"PopCountVL", "9"}) //9 PopCountVL nodes are generated for a long[] of LEN=1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a failOn check instead of counts check? The number of PopCountVL nodes is dependent on loop unrolling which keeps changing with loop optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can use the regex (">= ") which checks for atleast one PopCountVL node. Please see the updated code...
Updated the year to 2022 in copyright headers... |
jatin-bhateja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updates.
|
@vamsi-parasa The patch looks good to me. You will need another review. |
|
@vamsi-parasa This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 50 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja, @sviswa7, @vnkozlov) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
@vnkozlov Could you please review this and run it through your testing? |
|
Please update branch. Latest changes #6893 touched same files. |
|
Thank you Jatin and Sandhya for the review! |
|
Thank you Vladimir for looking into the patch. Will update the branch and let you know... |
|
Sorry, closed this issue accidentally... |
|
Hi Vladimir (@vnkozlov), tried to replicate the build errors on my IceLake machine but they did not occur for both release and debug builds. Both builds completed successfully... |
Build failure is on MacOSX x86 If you look on code it is really bug - missing 'opc =='. |
|
Fixed the 'opc == ' error. Thanks for identifying it! (gcc on Linux should have caught it) |
|
Tests failed on aarch64 systems and avx2 x86. |
Could you please let me know if the failing test is test/hotspot/jtreg/compiler/vectorization/TestPopCountVectorLong.java ?
This test exits gracefully on a Skylake machine which doesn't have AVX3. |
|
|
||
| int vlen_enc = vector_length_encoding(this, $src); | ||
| __ vpopcntq($dst$$XMMRegister, $src$$XMMRegister, vlen_enc); | ||
| __ evpmovqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Should this cast be introduced at the middle-end instead? Popcount is a lane-wise operation and forcing the node to do a shape-changing operation seems not so reasonable.
Thanks.
As I posted in my comment next tests failed on Aarch64 and x86 with avx2 only (AMD): |
|
|
Thank you Vladimir! |
|
Hi Vladimir (@vnkozlov) |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest version passed tier1-3 testing. Good.
Thank you Vladimir! |
|
/integrate |
|
@vamsi-parasa |
|
/integrate |
|
@sviswa7 Only the author (@vamsi-parasa) is allowed to issue the |
|
/sponsor |
|
Going to push as commit c4518e2.
Your commit was automatically rebased without conflicts. |
|
@sviswa7 @vamsi-parasa Pushed as commit c4518e2. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Vectorization support of Integer.bitCount() already exists but currently the same support is lacking for Long.bitCount(). Similar to the C2 PopCountVI node, we created a C2 PopCountVL node and used vpopcntq x86 instruction to enable vectorized Long.bitCount(). This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6857/head:pull/6857$ git checkout pull/6857Update a local copy of the PR:
$ git checkout pull/6857$ git pull https://git.openjdk.java.net/jdk pull/6857/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6857View PR using the GUI difftool:
$ git pr show -t 6857Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6857.diff