Skip to content

Conversation

@vamsi-parasa
Copy link
Contributor

@vamsi-parasa vamsi-parasa commented Dec 15, 2021

Vectorization support of Integer.bitCount() already exists but currently the same support is lacking for Long.bitCount(). Similar to the C2 PopCountVI node, we created a C2 PopCountVL node and used vpopcntq x86 instruction to enable vectorized Long.bitCount(). This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8278868: Add x86 vectorization support for Long.bitCount()

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6857/head:pull/6857
$ git checkout pull/6857

Update a local copy of the PR:
$ git checkout pull/6857
$ git pull https://git.openjdk.java.net/jdk pull/6857/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6857

View PR using the GUI difftool:
$ git pr show -t 6857

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6857.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 15, 2021

👋 Welcome back vamsi-parasa! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 15, 2021

@vamsi-parasa The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@vamsi-parasa vamsi-parasa changed the title 8278868:Add x86 vectorization support for Long.bitCount() 8278868: Add x86 vectorization support for Long.bitCount() Dec 15, 2021
@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Dec 15, 2021
@vamsi-parasa
Copy link
Contributor Author

/label hotspot-compiler

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 15, 2021
@openjdk
Copy link

openjdk bot commented Dec 15, 2021

@vamsi-parasa
The hotspot-compiler label was successfully added.

@vamsi-parasa vamsi-parasa marked this pull request as ready for review December 15, 2021 23:57
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 15, 2021
@mlbridge
Copy link

mlbridge bot commented Dec 16, 2021

@vamsi-parasa
Copy link
Contributor Author

This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization.

@jatin-bhateja
Copy link
Member

Please also update copywrite headers of modified files.

Comment on lines 1412 to 1416
case Op_PopCountVL:
if (!UsePopCountInstruction || !VM_Version::supports_avx512_vpopcntdq()) {
return false;
}
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case could be combined with case Op_PopCountVI and duplication removed. The check is the same for both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code as per your suggestion to avoid duplication in the latest commit...


@Test // needs to be run in (fast) debug mode
@Warmup(10000)
@IR(counts = {"PopCountVL", "9"}) //9 PopCountVL nodes are generated for a long[] of LEN=1024
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a failOn check instead of counts check? The number of PopCountVL nodes is dependent on loop unrolling which keeps changing with loop optimizations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can use the regex (">= ") which checks for atleast one PopCountVL node. Please see the updated code...

@vamsi-parasa
Copy link
Contributor Author

Please also update copywrite headers of modified files.

Updated the year to 2022 in copyright headers...

Copy link
Member

@jatin-bhateja jatin-bhateja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updates.

@sviswa7
Copy link

sviswa7 commented Jan 6, 2022

@vamsi-parasa The patch looks good to me. You will need another review.

@openjdk
Copy link

openjdk bot commented Jan 6, 2022

@vamsi-parasa This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8278868: Add x86 vectorization support for Long.bitCount()

Reviewed-by: jbhateja, sviswanathan, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 50 new commits pushed to the master branch:

  • 6714184: 8279700: Parallel: Simplify ScavengeRootsTask constructor API
  • cfee451: 8273914: Indy string concat changes order of operations
  • c3d0a94: 8279833: Loop optimization issue in String.encodeUTF8_UTF16
  • 9e02447: 8279834: Alpine Linux fails to build when --with-source-date enabled
  • 08e14c6: 8278207: G1: Tighten verification in G1ResetSkipCompactingClosure
  • c08b2ac: 8225093: Special property jdk.boot.class.path.append should not default to empty string
  • 4c52eb3: 8279669: test/jdk/com/sun/jdi/TestScaffold.java uses wrong condition
  • d46410c: 8279785: JFR: 'jfr configure' should show default values
  • 2bbeae3: 8279668: x86: AVX2 versions of vpxor should be asserted
  • 3121898: 8279703: G1: Remove unused force_not_compacted local in G1CalculatePointersClosure::do_heap_region
  • ... and 40 more: https://git.openjdk.java.net/jdk/compare/b3dbfc645283cb315016ec531ec41570ab3f75f1...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja, @sviswa7, @vnkozlov) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@sviswa7
Copy link

sviswa7 commented Jan 6, 2022

@vnkozlov Could you please review this and run it through your testing?

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 6, 2022
@vnkozlov
Copy link
Contributor

vnkozlov commented Jan 6, 2022

Please update branch. Latest changes #6893 touched same files.

@vamsi-parasa
Copy link
Contributor Author

Thank you Jatin and Sandhya for the review!

@vamsi-parasa
Copy link
Contributor Author

Thank you Vladimir for looking into the patch. Will update the branch and let you know...

@vamsi-parasa
Copy link
Contributor Author

Sorry, closed this issue accidentally...

@vamsi-parasa
Copy link
Contributor Author

Hi Vladimir (@vnkozlov), tried to replicate the build errors on my IceLake machine but they did not occur for both release and debug builds. Both builds completed successfully...

@vnkozlov
Copy link
Contributor

vnkozlov commented Jan 6, 2022

Hi Vladimir (@vnkozlov), tried to replicate the build errors on my IceLake machine but they did not occur for both release and debug builds. Both builds completed successfully...

Build failure is on MacOSX x86

If you look on code it is really bug - missing 'opc =='.

@vamsi-parasa
Copy link
Contributor Author

vamsi-parasa commented Jan 7, 2022

Fixed the 'opc == ' error. Thanks for identifying it! (gcc on Linux should have caught it)
Will try to replicate the VectorCastI2X and VectorCastL2X errors in compiler/codegen/TestLongDoubleVect.java, compiler/codegen/TestIntFloatVect.java...

@vnkozlov
Copy link
Contributor

vnkozlov commented Jan 7, 2022

Tests failed on aarch64 systems and avx2 x86.

@vamsi-parasa
Copy link
Contributor Author

vamsi-parasa commented Jan 10, 2022

Tests failed on aarch64 systems and avx2 x86.

Could you please let me know if the failing test is test/hotspot/jtreg/compiler/vectorization/TestPopCountVectorLong.java ?
Added additional checks (shown below) to make sure it runs on an x86 machine that has AVX3.

  • @requires vm.compiler2.enabled
  • @requires os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64"

This test exits gracefully on a Skylake machine which doesn't have AVX3.


int vlen_enc = vector_length_encoding(this, $src);
__ vpopcntq($dst$$XMMRegister, $src$$XMMRegister, vlen_enc);
__ evpmovqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Should this cast be introduced at the middle-end instead? Popcount is a lane-wise operation and forcing the node to do a shape-changing operation seems not so reasonable.
Thanks.

@vnkozlov
Copy link
Contributor

Tests failed on aarch64 systems and avx2 x86.

Could you please let me know if the failing test is test/hotspot/jtreg/compiler/vectorization/TestPopCountVectorLong.java ? Added additional checks (shown below) to make sure it runs on an x86 machine that has AVX3.

  • @requires vm.compiler2.enabled
  • @requires os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64"

This test exits gracefully on a Skylake machine which doesn't have AVX3.

As I posted in my comment next tests failed on Aarch64 and x86 with avx2 only (AMD):

compiler/codegen/TestIntFloatVect.java
compiler/codegen/TestLongDoubleVect.java

@vnkozlov
Copy link
Contributor

TestPopCountVectorLong.java was not ran on these systems because it has @requires vm.cpu.features ~= ".*avx512dq.*"
And I did not test other tiers because tier1 had failures.

@vamsi-parasa
Copy link
Contributor Author

Tests failed on aarch64 systems and avx2 x86.

Could you please let me know if the failing test is test/hotspot/jtreg/compiler/vectorization/TestPopCountVectorLong.java ? Added additional checks (shown below) to make sure it runs on an x86 machine that has AVX3.

  • @requires vm.compiler2.enabled
  • @requires os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64"

This test exits gracefully on a Skylake machine which doesn't have AVX3.

As I posted in my comment next tests failed on Aarch64 and x86 with avx2 only (AMD):

compiler/codegen/TestIntFloatVect.java
compiler/codegen/TestLongDoubleVect.java

Thank you Vladimir!
Will work on fixing the compiler/codegen/{TestIntFloatVect.java, TestLongDoubleVect.java}
This patch is not supposed to affect those tests but I will investigate why they're failing and update you...

@vamsi-parasa
Copy link
Contributor Author

Hi Vladimir (@vnkozlov)
Could you please check if you incorporated the fix for the 'opc == ' bug? The fix for that bug was already pushed last week.
Because, without the bug fix, I was able to reproduce the failure of compiler/codegen/TestIntFloatVect.java on AVX2 x86 machine.
After applying the fix (which was pushed last week), both the tests compiler/codegen/{TestIntFloatVect.java, TestLongDoubleVect.java} are passing on AVX2(x86)

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version passed tier1-3 testing. Good.

@vamsi-parasa
Copy link
Contributor Author

Latest version passed tier1-3 testing. Good.

Thank you Vladimir!

@vamsi-parasa
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jan 11, 2022
@openjdk
Copy link

openjdk bot commented Jan 11, 2022

@vamsi-parasa
Your change (at version d2c0099) is now ready to be sponsored by a Committer.

@sviswa7
Copy link

sviswa7 commented Jan 11, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Jan 11, 2022

@sviswa7 Only the author (@vamsi-parasa) is allowed to issue the integrate command. As this pull request is ready to be sponsored, and you are an eligible sponsor, did you mean to issue the /sponsor command?

@sviswa7
Copy link

sviswa7 commented Jan 11, 2022

/sponsor

@openjdk
Copy link

openjdk bot commented Jan 11, 2022

Going to push as commit c4518e2.
Since your change was applied there have been 50 commits pushed to the master branch:

  • 6714184: 8279700: Parallel: Simplify ScavengeRootsTask constructor API
  • cfee451: 8273914: Indy string concat changes order of operations
  • c3d0a94: 8279833: Loop optimization issue in String.encodeUTF8_UTF16
  • 9e02447: 8279834: Alpine Linux fails to build when --with-source-date enabled
  • 08e14c6: 8278207: G1: Tighten verification in G1ResetSkipCompactingClosure
  • c08b2ac: 8225093: Special property jdk.boot.class.path.append should not default to empty string
  • 4c52eb3: 8279669: test/jdk/com/sun/jdi/TestScaffold.java uses wrong condition
  • d46410c: 8279785: JFR: 'jfr configure' should show default values
  • 2bbeae3: 8279668: x86: AVX2 versions of vpxor should be asserted
  • 3121898: 8279703: G1: Remove unused force_not_compacted local in G1CalculatePointersClosure::do_heap_region
  • ... and 40 more: https://git.openjdk.java.net/jdk/compare/b3dbfc645283cb315016ec531ec41570ab3f75f1...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 11, 2022
@openjdk openjdk bot closed this Jan 11, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jan 11, 2022
@openjdk
Copy link

openjdk bot commented Jan 11, 2022

@sviswa7 @vamsi-parasa Pushed as commit c4518e2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants