8286972: Support the new loop induction variable related PopulateIndex IR node on x86 #8778

sviswa7 · 2022-05-18T17:25:38Z

This PR adds x86 backend support for the new loop induction variable related PopulateIndex IR node.
This IR node was added as part of JDK-8280510.

The performance numbers are as follows:
Before:
Benchmark (count) Mode Cnt Score Error Units
IndexVector.exprWithIndex1 65536 thrpt 3 64556.552 ± 1126.396 ops/s
IndexVector.exprWithIndex2 65536 thrpt 3 22117.050 ± 11452.098 ops/s
IndexVector.indexArrayFill 65536 thrpt 3 117776.383 ± 1120.957 ops/s

After:
Benchmark (count) Mode Cnt Score Error Units
IndexVector.exprWithIndex1 65536 thrpt 3 203180.290 ± 2147.807 ops/s
IndexVector.exprWithIndex2 65536 thrpt 3 274132.756 ± 6853.393 ops/s
IndexVector.indexArrayFill 65536 thrpt 3 374165.202 ± 46930.779 ops/s

Please review.

Best Regards,
Sandhya

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (1 review required, with at least 1 reviewer)

Issue

JDK-8286972: Support the new loop induction variable related PopulateIndex IR node on x86

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer) ⚠️ Review applies to 727daec0
Jatin Bhateja (@jatin-bhateja - Committer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8778/head:pull/8778
$ git checkout pull/8778

Update a local copy of the PR:
$ git checkout pull/8778
$ git pull https://git.openjdk.java.net/jdk pull/8778/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8778

View PR using the GUI difftool:
$ git pr show -t 8778

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8778.diff

…x IR node on x86

bridgekeeper · 2022-05-18T17:26:24Z

👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-05-18T17:27:56Z

@sviswa7 The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2022-05-18T21:53:07Z

Webrevs

vnkozlov · 2022-05-18T23:37:00Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

+  bool is_bw = ((elem_bt == T_BYTE) || (elem_bt == T_SHORT));
+  bool is_bw_supported = VM_Version::supports_avx512bw();
+  if (is_bw && !is_bw_supported) {
+    assert(vlen_enc != Assembler::AVX_512bit, "required");
+    assert((dst->encoding() < 16) && (src1->encoding() < 16) && (src2->encoding() < 16),
+           "XMM register should be 0-15");
+  }


This whole block could be under #ifdef ASSERT.

vnkozlov · 2022-05-18T23:38:25Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

+  bool is_bw = ((elem_bt == T_BYTE) || (elem_bt == T_SHORT));
+  bool is_bw_supported = VM_Version::supports_avx512bw();
+  if (is_bw && !is_bw_supported) {
+    assert(vlen_enc != Assembler::AVX_512bit, "required");


What are acceptable values of vlen_enc?

For KNL, PopulateIndex support is limited to 256-bit as we need avx512bw() for the 512-bit support.
For other AVX2 and AVX512 architectures, all vector widths up to and including 512-bit are supported.

vnkozlov · 2022-05-18T23:46:07Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

+  } else {
+    assert(vlen_enc != Assembler::AVX_512bit, "required");
+    assert((dst->encoding() < 16),"XMM register should be 0-15");


The } else { case will be also executed on on KNL CPU. Did you tested with -XX:+UseKNLSetting?

Yes, this part will be executed on KNL CPU.
I did run the compiler tests with UseKNLSetting and didn't see any issue.

vnkozlov · 2022-05-18T23:51:07Z

src/hotspot/cpu/x86/x86.ad

+    case Op_PopulateIndex:
+      if (!is_LP64) {
+        return false;
+      }
+      // fallthrough


Please, don't do fallthrough - RoundVF is not related to PopulateIndex. Why not if (!is_LP64 || UseAVX < 2)?
Are there limitations in 32 bits or you don't want spend time on not major platform (which is also understandable)?

It is not a major platform so I didn't spend time on it.

vnkozlov · 2022-05-18T23:51:41Z

src/hotspot/cpu/x86/x86.ad

+      if (size_in_bits > 256 && !VM_Version::supports_avx512bw())
+        return false;


Use {} according to our style.

pfustc · 2022-05-19T00:44:42Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

 }

+void C2_MacroAssembler::vpadd(BasicType elem_bt, XMMRegister dst, XMMRegister src1, XMMRegister src2, int vlen_enc) {
+  assert(UseAVX >= 2, "required");


Why not include this line in #ifdef ?

It does not matter since it is assert which add code only in debug VM. I like this way.

vnkozlov

Looks good. Let me test it.

vnkozlov · 2022-05-19T02:38:07Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

 }

+void C2_MacroAssembler::vpadd(BasicType elem_bt, XMMRegister dst, XMMRegister src1, XMMRegister src2, int vlen_enc) {
+  assert(UseAVX >= 2, "required");


It does not matter since it is assert which add code only in debug VM. I like this way.

vnkozlov · 2022-05-19T02:38:46Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

+  bool is_bw = ((elem_bt == T_BYTE) || (elem_bt == T_SHORT));
+  bool is_bw_supported = VM_Version::supports_avx512bw();
+  if (is_bw && !is_bw_supported) {
+    assert(vlen_enc != Assembler::AVX_512bit, "required");


vnkozlov · 2022-05-19T02:59:10Z

@sviswa7 Can you add IR framework test to verify generation of PopulateIndex node? And regression test.
I see that 8280510 added only microbenchmark.

sviswa7 · 2022-05-19T03:12:04Z

@vnkozlov I will look into adding the IR framework and regression test.

vnkozlov

Tier1-4 testing passed.

sviswa7 · 2022-05-19T17:01:02Z

@vnkozlov thanks a lot.

sviswa7 · 2022-05-19T23:28:04Z

@vnkozlov I have added the IR framework jtreg test. Please review.

vnkozlov

Good.

You need second review

openjdk · 2022-05-20T00:23:42Z

@sviswa7 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8286972: Support the new loop induction variable related PopulateIndex IR node on x86

Reviewed-by: kvn, jbhateja

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 46 new commits pushed to the master branch:

40e99a1: 8285308: Win: Japanese logical fonts are drawn with wrong size
890771e: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character
de74e0e: 8280035: Use Class.isInstance instead of Class.isAssignableFrom where applicable
9f562ef: 8286872: Refactor add/modify notification icon (TrayIcon)
b089229: 8271078: jdk/incubator/vector/Float128VectorTests.java failed a subtest
079312c: 8286182: [BACKOUT] x86: Handle integral division overflow during parsing
7b19226: 8267038: Update IANA Language Subtag Registry to Version 2022-03-02
e60d8b5: 8253176: Signature.getParameters should specify that it can throw UnsupportedOperationException
aa50625: 8286447: [Linux] AWT should start in Headless mode if headful AWT library not installed
655500a: 8286654: Add an optional description accessor on ToolProvider interface
... and 36 more: https://git.openjdk.java.net/jdk/compare/2ed75be659503da584cfec9ead5e27665ae900ef...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

sviswa7 · 2022-05-20T00:46:41Z

@vnkozlov Thanks a lot for the review.

pfustc · 2022-05-20T01:57:04Z

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java

+* @summary Test vectorization of loop induction variable usage in the loop
+* @requires vm.compiler2.enabled
+* @requires vm.cpu.features ~= ".*avx2.*"
+* @requires os.arch=="amd64" | os.arch=="x86_64"


Can we simplify os.arch=="amd64" | os.arch=="x86_64" to os.simpleArch == "x64" ?

This test runs on x86 only. It would be nice if it can run on AArch64 as well. So perhaps something like

28 * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx2.*") | 29 * (os.simpleArch == "aarch64" & vm.cpu.features ~= ".*sve.*")

@pfustc Yes, changed the requires to as suggested by you.

jatin-bhateja · 2022-05-20T04:04:42Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

+      case T_FLOAT: case T_INT: evpbroadcastd(dst, src, vlen_enc); return;
+      case T_DOUBLE: case T_LONG: evpbroadcastq(dst, src, vlen_enc); return;


Can't we use single and double precision broadcasts for floating point types, like you have done in else part
It may save domain switch over penalty (Section 3.5.2.2 Bypass between Execution Domains, Intel® 64 and IA-32 Architectures Optimization Reference Manual)

The floating point broadcast doesn't take the gpr as second source.

A prior move as in else part may be emitted for consistency or you want to keep floating point broadcasts only for else part.

jatin-bhateja · 2022-05-20T04:12:40Z

src/hotspot/cpu/x86/x86.ad

+     int vlen_in_bytes = Matcher::vector_length_in_bytes(this);
+     int vlen_enc = vector_length_encoding(this);
+     BasicType elem_bt = Matcher::vector_element_basic_type(this);
+     assert($src2$$constant == 1, "required");


Ideally assertion should be the first statement in a block, since they determine the pre-conditions under which code should executed.

jatin-bhateja · 2022-05-20T04:17:11Z

src/hotspot/cpu/x86/x86.ad

+  effect(TEMP dst, TEMP vtmp, TEMP scratch);
+  format %{ "vector_populate_index $dst $src1 $src2\t! using $vtmp and $scratch as TEMP" %}
+  ins_encode %{
+     int vlen_in_bytes = Matcher::vector_length_in_bytes(this);


Matcher::vector_length can be directly used instead of following computation in line 8274
vlen_in_bytes/type2aelembytes(elem_bt)

jatin-bhateja · 2022-05-20T04:18:06Z

src/hotspot/cpu/x86/x86.ad

+  effect(TEMP dst, TEMP vtmp, TEMP scratch);
+  format %{ "vector_populate_index $dst $src1 $src2\t! using $vtmp and $scratch as TEMP" %}
+  ins_encode %{
+     int vlen_in_bytes = Matcher::vector_length_in_bytes(this);


Same as above.

jatin-bhateja · 2022-05-20T04:18:21Z

src/hotspot/cpu/x86/x86.ad

+     int vlen_in_bytes = Matcher::vector_length_in_bytes(this);
+     int vlen_enc = vector_length_encoding(this);
+     BasicType elem_bt = Matcher::vector_element_basic_type(this);
+     assert($src2$$constant == 1, "required");


Same as above

jatin-bhateja · 2022-05-20T04:19:09Z

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java

+
+/**
+* @test
+* @summary Test vectorization of loop induction variable usage in the loop


PR id missing.

jatin-bhateja · 2022-05-20T04:22:42Z

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java

+import java.util.Random;
+
+public class TestPopulateIndex {
+    private static final int count = 65536;


Small array size around 10K may work, we can also tune CompileThresholdScaling.

jatin-bhateja · 2022-05-20T04:29:40Z

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java

+    }
+
+    public void checkResultIndexArrayFill() {
+        for (int i = 0; i < count; ++i) {


post-incrementation for consistency.

jatin-bhateja · 2022-05-20T04:36:52Z

test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java

+    }
+
+    public void checkResultExprWithIndex2() {
+        for (int i = 0; i < count; ++i) {


post-increment induction.

sviswa7 · 2022-05-20T05:11:43Z

@jatin-bhateja Thanks a lot for the review. Your review comments are implemented.

jatin-bhateja · 2022-05-20T06:34:13Z

LGTM. Thanks

vnkozlov · 2022-05-20T14:59:42Z

I have to re-test it since code change in .ad file.

sviswa7 · 2022-05-20T18:25:58Z

@vnlozlov Thanks a lot, I will wait for your go ahead.

vnkozlov · 2022-05-21T15:24:11Z

My testing passed. You are good to push.

sviswa7 · 2022-05-23T15:27:03Z

/integrate

openjdk · 2022-05-23T15:28:43Z

Going to push as commit 5d8d6da.
Since your change was applied there have been 73 commits pushed to the master branch:

8122466: 8287113: JFR: Periodic task thread uses period for method sampling events
940e94f: 8285739: disable EscapeBarrier deopt for virtual threads
110d906: 8287103: java/lang/management/ThreadMXBean/VirtualThreadDeadlocks.java fails with Xcomp
8040aa0: 8286908: ECDSA signature should not return parameters
689f80c: 8287153: Whitespace typos in HeapRegion class
c906591: 8286391: Address possibly lossy conversions in jdk.compiler
88018c4: 8287150: Remove HeapRegion::block_start_const declaration without definition
81f128b: 8287154: java/nio/channels/FileChannel/LargeMapTest.java does not compile
7becf13: 8286825: Linker naming cleanup
a0042de: 8276549: Improve documentation about ContainerPtr encoding
... and 63 more: https://git.openjdk.java.net/jdk/compare/2ed75be659503da584cfec9ead5e27665ae900ef...master

Your commit was automatically rebased without conflicts.

openjdk · 2022-05-23T15:28:55Z

@sviswa7 Pushed as commit 5d8d6da.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

fg1417 · 2022-05-31T09:07:00Z

src/hotspot/cpu/x86/x86.ad

+  format %{ "vector_populate_index $dst $src1 $src2\t! using $vtmp and $scratch as TEMP" %}
+  ins_encode %{
+     assert($src2$$constant == 1, "required");
+     int vlen = Matcher::vector_length(this);


May I ask why use Matcher::vector_length() here, rather than Matcher::vector_length_in_bytes(), for load_iota_indices()? Thanks.

8286972: Support the new loop induction variable related PopulateInde…

8a1b7c5

…x IR node on x86

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label May 18, 2022

fix 32 bit build

a21939e

sviswa7 marked this pull request as ready for review May 18, 2022 21:47

openjdk bot added the rfr Pull request is ready for review label May 18, 2022

vnkozlov reviewed May 18, 2022

View reviewed changes

review comment resolution

ab07fae

pfustc reviewed May 19, 2022

View reviewed changes

vnkozlov reviewed May 19, 2022

View reviewed changes

sviswa7 added 2 commits May 19, 2022 16:03

Add jtreg test

af5ede4

remove warmup

727daec

vnkozlov approved these changes May 20, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label May 20, 2022

pfustc reviewed May 20, 2022

View reviewed changes

sviswa7 added 2 commits May 19, 2022 21:19

Change requires to add sve

8c69c7f

review comment resolution

1b3d0b5

jatin-bhateja reviewed May 20, 2022

View reviewed changes

jatin-bhateja approved these changes May 20, 2022

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label May 23, 2022

openjdk bot closed this May 23, 2022

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 23, 2022

vnkozlov mentioned this pull request May 25, 2022

8284960: Integration of JEP 426: Vector API (Fourth Incubator) #8425

Closed

5 tasks

fg1417 reviewed May 31, 2022

View reviewed changes

sviswa7 deleted the popindex branch June 3, 2024 21:37

		if (size_in_bits > 256 && !VM_Version::supports_avx512bw())
		return false;

		case T_FLOAT: case T_INT: evpbroadcastd(dst, src, vlen_enc); return;
		case T_DOUBLE: case T_LONG: evpbroadcastq(dst, src, vlen_enc); return;

8286972: Support the new loop induction variable related PopulateIndex IR node on x86 #8778

8286972: Support the new loop induction variable related PopulateIndex IR node on x86 #8778

Uh oh!

Conversation

sviswa7 commented May 18, 2022 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented May 18, 2022

Uh oh!

openjdk bot commented May 18, 2022

Uh oh!

mlbridge bot commented May 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vnkozlov commented May 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sviswa7 commented May 19, 2022

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

sviswa7 commented May 19, 2022

Uh oh!

sviswa7 commented May 19, 2022

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sviswa7 commented May 20, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sviswa7 commented May 18, 2022 •

edited by openjdk bot

Loading

mlbridge bot commented May 18, 2022 •

edited

Loading

vnkozlov commented May 19, 2022 •

edited

Loading

openjdk bot commented May 20, 2022 •

edited

Loading