8277850: C2: optimize mask checks in counted loops #6697

rwestrel · 2021-12-03T10:04:58Z

This is another fix that addresses a performance issue with panama and that was brought up by Maurizio. The pattern to optimize is:
if ((base + (offset << 2)) & 3) != 0) {
}

where base is loop independent but offset depends on a loop variable. This can be transformed to:

if ((base & 3) != 0) {

That check becomes loop independent and be optimized by loop predication (or I suppose loop unswitching but that wasn't the case of the micro benchmark I worked on).

This change also optimizes the pattern:

(offset << 2) & 3

to return 0.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8277850: C2: optimize mask checks in counted loops

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer)
Tobias Hartmann (@TobiHartmann - Reviewer) ⚠️ Review applies to 94b6869

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6697/head:pull/6697
$ git checkout pull/6697

Update a local copy of the PR:
$ git checkout pull/6697
$ git pull https://git.openjdk.java.net/jdk pull/6697/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6697

View PR using the GUI difftool:
$ git pr show -t 6697

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6697.diff

bridgekeeper · 2021-12-03T10:05:35Z

👋 Welcome back roland! A progress list of the required criteria for merging this PR into pr/6607 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2021-12-03T10:07:13Z

@rwestrel The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2021-12-03T10:22:32Z

Webrevs

vnkozlov · 2021-12-04T00:05:05Z

@rwestrel there is issue with build on Windows:

d:\a\jdk\jdk\jdk\src\hotspot\share\opto\mulnode.cpp(1746): error C2220: the following warning is treated as an error
20
d:\a\jdk\jdk\jdk\src\hotspot\share\opto\mulnode.cpp(1746): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
21

vnkozlov · 2021-12-04T00:33:04Z

src/hotspot/share/opto/mulnode.cpp

+    return TypeInt::ZERO;
+  }
+
+  return MulINode::Value(phase);


There is no MulINode::value() or MulLNode::value(), only based class MulNode have it.
You missing initial check done by MulNode::value().
I suggest to move this code into AndINode::mul_ring() and AndLNode::mul_ring().
Also in mul_ring() you can pass r1->get_con() as mask value to AndIL_shift_and_mask().

Thanks for reviewing this.
MulNode::AndIL_shift_and_mask() needs the shift node so it can test for shift->Opcode() == Op_ConvI2L. mult_ring() only gets the Type* as input. Are you suggesting to change mul_ring's signature?

You are right, I missed that.
It is unfortunate - we will have many duplicated checks.

vnkozlov · 2021-12-04T00:45:17Z

src/hotspot/share/opto/mulnode.cpp

@@ -1683,3 +1713,58 @@ const Type* RotateRightNode::Value(PhaseGVN* phase) const {
    return TypeLong::LONG;
  }
 }
+
+bool MulNode::AndIL_shift_and_mask(PhaseGVN* phase, Node* mask, Node* shift, BasicType bt) const {


Pass mask as value here.

vnkozlov · 2021-12-04T00:48:13Z

src/hotspot/share/opto/mulnode.cpp

+
+Node* MulNode::AndIL_add_shift_and_mask(PhaseGVN* phase, BasicType bt) {
+  Node* in1 = in(1);
+  Node* in2 = in(2);


Caller already determine that in2 is const int mask = t2->get_con(). You can pass it here.

vnkozlov · 2021-12-04T00:52:07Z

src/hotspot/share/opto/mulnode.cpp

@@ -1683,3 +1713,58 @@ const Type* RotateRightNode::Value(PhaseGVN* phase) const {
    return TypeLong::LONG;
  }
 }
+


Add comment here too which pattern it is looking for.

vnkozlov · 2021-12-04T00:53:51Z

src/hotspot/share/opto/mulnode.cpp

+
+bool MulNode::AndIL_shift_and_mask(PhaseGVN* phase, Node* mask, Node* shift, BasicType bt) const {
+  if (mask == NULL || shift == NULL) {
+    return false;


You need to check shift for TOP.

Code below:

const TypeInteger* shift_t = phase->type(shift)->isa_integer(bt); if (mask_t == NULL || shift_t == NULL) {

catches the case where shift is top, I think.

You are right. It is type check.

vnkozlov · 2021-12-06T17:31:29Z

src/hotspot/share/opto/mulnode.cpp

+    return TypeInt::ZERO;
+  }
+
+  return MulINode::Value(phase);


You are right, I missed that.
It is unfortunate - we will have many duplicated checks.

vnkozlov · 2021-12-06T17:35:35Z

src/hotspot/share/opto/mulnode.hpp

@@ -82,6 +82,9 @@ class MulNode : public Node {
  virtual int min_opcode() const = 0;

  static MulNode* make(Node* in1, Node* in2, BasicType bt);
+
+  bool AndIL_shift_and_mask(PhaseGVN* phase, Node* mask, Node* shift, BasicType bt) const;


This one could be static method.

TobiHartmann

Looks good to me, I submitted testing.

TobiHartmann · 2021-12-07T07:50:44Z

test/hotspot/jtreg/compiler/c2/irTests/TestShiftAndMask.java

+
+/*
+ * @test
+ * @bug JDK-8277850


Should be @bug 8277850 (not sure if that is a requirement but all other tests use that format).

TobiHartmann · 2021-12-07T07:52:21Z

test/hotspot/jtreg/compiler/c2/irTests/TestShiftAndMask.java

+ * @run driver compiler.c2.irTests.TestShiftAndMask
+ */
+
+public class TestShiftAndMask {


Nice tests!

TobiHartmann · 2021-12-07T08:31:31Z

src/hotspot/share/opto/mulnode.cpp

@@ -675,7 +699,7 @@ Node *AndLNode::Ideal(PhaseGVN *phase, bool can_reshape) {
  const jlong mask = t2->get_con();

  Node* in1 = in(1);
-  uint op = in1->Opcode();
+  int op = in1->Opcode();


I got curious and checked the code, we have similar patterns everywhere. Filed JDK-8278328 to clean up this mess.

TobiHartmann · 2021-12-07T08:35:42Z

src/hotspot/share/opto/mulnode.cpp

+  return false;
+}
+
+Node* MulNode::AndIL_add_shift_and_mask(PhaseGVN* phase, BasicType bt) {


Please add a comment describing the pattern.

TobiHartmann · 2021-12-07T09:19:02Z

I submitted testing.

Build on Windows failed:

[2021-12-07T09:08:24,657Z] t:\workspace\open\src\hotspot\share\opto\mulnode.cpp(1746): error C2220: the following warning is treated as an error
[2021-12-07T09:08:24,657Z] t:\workspace\open\src\hotspot\share\opto\mulnode.cpp(1746): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
[2021-12-07T09:08:24,703Z] lib/CompileJvm.gmk:141: recipe for target '/cygdrive/t/workspace/build/windows-x64/hotspot/variant-server/libjvm/objs/mulnode.obj' failed

vnkozlov · 2021-12-07T17:18:42Z

@rwestrel Something went wrong with rebasing. It shows changes in 533 files.

rwestrel · 2021-12-07T17:35:59Z

@rwestrel Something went wrong with rebasing. It shows changes in 533 files.

Sorry about that. This PR depends on #6607 and that's what must have caused the problem. I rebased and force pushed. Hopefully this is fixed now.

vnkozlov

Good. Let me test it before approval.

vnkozlov · 2021-12-07T18:31:40Z

Unfortunately I can't test these changes because they depend on #6607
We have to wait when that change is pushed.

vnkozlov · 2021-12-08T00:28:59Z

I did the same as Tobias and applied #6607 changes first.
I still see Windows build failure Tobias pointed.

openjdk-notifier · 2021-12-08T07:57:33Z

The dependent pull request has now been integrated, and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork:

git checkout JDK-8262341
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push

openjdk · 2021-12-08T07:58:41Z

@rwestrel this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8277850
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

TobiHartmann

That looks good to me. Tests are running.

openjdk · 2021-12-08T14:03:00Z

@rwestrel This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8277850: C2: optimize mask checks in counted loops

Reviewed-by: kvn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

3e93e0b: 8276769: -Xshare:auto should tolerate problems in the CDS archive
79165b7: 8278324: Update the --generate-cds-archive jlink plugin usage message
40d726b: 8278310: Improve logging in CDS DynamicLoaderConstraintsTest.java
e4852c6: 8277998: runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java#custom-cl-zgc failed "assert(ZAddress::is_marked(addr)) failed: Should be marked"
37921e3: 8269258: java/net/httpclient/ManyRequestsLegacy.java failed with connection timeout

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

vnkozlov

Build issue solved. Thanks.
Testing looks good too so far. It is still running but should be finished soon.

@rwestrel, please, push it today before RDP1 cut off.

rwestrel · 2021-12-08T16:38:44Z

@vnkozlov @TobiHartmann thanks for the reviews.

rwestrel · 2021-12-08T16:38:50Z

/integrate

openjdk · 2021-12-08T16:39:41Z

Going to push as commit 8af3b27.
Since your change was applied there have been 5 commits pushed to the master branch:

3e93e0b: 8276769: -Xshare:auto should tolerate problems in the CDS archive
79165b7: 8278324: Update the --generate-cds-archive jlink plugin usage message
40d726b: 8278310: Improve logging in CDS DynamicLoaderConstraintsTest.java
e4852c6: 8277998: runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java#custom-cl-zgc failed "assert(ZAddress::is_marked(addr)) failed: Should be marked"
37921e3: 8269258: java/net/httpclient/ManyRequestsLegacy.java failed with connection timeout

Your commit was automatically rebased without conflicts.

openjdk · 2021-12-08T16:39:57Z

@rwestrel Pushed as commit 8af3b27.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

rose00 · 2021-12-08T23:35:51Z

src/hotspot/share/opto/mulnode.cpp

+// patterns similar to (v << 2) & 3 to 0
+// and
+// patterns similar to (v1 + (v2 << 2)) & 3 transformed to v1 & 3
+bool MulNode::AndIL_shift_and_mask(PhaseGVN* phase, Node* mask, Node* shift, BasicType bt) {


The operand order is distracting and perhaps confusing, since mask is in(2) and shift is in(1).

The mask is not necessarily in(2), because it doesn't have to be a constant; it can be a bounded value.

I wrote an expanded comment.

There's a redundant NULL check.

You should use a different name for the adjusted bt value.

Here are the changes I suggest to this function:

// Given an expression (AndX shift mask) or (AndX mask shift), // determine if the AndX must always produce zero, because the // the shift (x<<N) is bitwise disjoint from the mask #M. // The X in AndX must be I or L, depending on bt. // Specifically, the following cases fold to zero, // when the shift value N is large enough to zero out // all the set positions of the and-mask M. // (AndI (LShiftI _ #N) #M) => #0 // (AndL (LShiftL _ #N) #M) => #0 // (AndL (ConvI2L (LShiftI _ #N)) #M) => #0 // The M and N values must satisfy ((-1 << N) & M) == 0. // Because the optimization might work for a non-constant // mask M, we check the AndX for both operand orders. bool MulNode::AndIL_shift_and_mask(PhaseGVN* phase, Node* shift, Node* mask, BasicType bt, bool check_reverse) const { if (mask == NULL || shift == NULL) { return false; } const TypeInteger* mask_t = phase->type(mask)->isa_integer(bt); const TypeInteger* shift_t = phase->type(shift)->isa_integer(bt); if (mask_t == NULL || shift_t == NULL) { return false; } + BasicType shift_bt = bt; + if (bt == T_LONG && shift->Opcode() == Op_ConvI2L) { Node* val = shift->in(1); if (val == NULL) { return false; } + if (val->Opcode() == Op_LShiftI) { + shift_bt = T_INT; + shift = val; + } } #s/bt/shift_bt/ if (shift->Opcode() != Op_LShift(shift_bt)) { + if (check_reverse && + (mask->Opcode() == Op_LShift(bt) || + (bt == T_LONG && mask->Opcode() == Op_ConvI2L))) { + // try it the other way around + return AndIL_shift_and_mask(phase, shift, mask, bt, false); + } return false; } Node* shift2 = shift->in(2); if (shift2 == NULL) { return false; } const Type* shift2_t = phase->type(shift2); if (!shift2_t->isa_int() || !shift2_t->is_int()->is_con()) { return false; } #s/bt/shift_bt/ jint shift_con = shift2_t->is_int()->get_con() & ((shift_bt == T_INT ? BitsPerJavaInteger : BitsPerJavaLong) - 1); if ((1L << shift_con) > mask_t->hi_as_long() && mask_t->lo_as_long() >= 0) { return true; } return false; }

Suggest signature change:

rose00 · 2021-12-08T23:39:16Z

src/hotspot/share/opto/mulnode.cpp

+
+// Helper method to transform:
+// patterns similar to (v1 + (v2 << 2)) & 3 to v1 & 3
+Node* MulNode::AndIL_add_shift_and_mask(PhaseGVN* phase, BasicType bt) {


Similar comments for this function also.

// Given an expression (AndX (AddX v1 (LShiftX v2 #N)) #M) // determine if the AndX must always produce (AndX v1 #M), // because the shift (v2<<N) is bitwise disjoint from the mask #M. // The X in AndX will be I or L, depending on bt. // Specifically, the following cases fold, // when the shift value N is large enough to zero out // all the set positions of the and-mask M. // (AndI (AddI v1 (LShiftI _ #N)) #M) => v1 // (AndL (AddI v1 (LShiftL _ #N)) #M) => v1 // (AndL (AddL v1 (ConvI2L (LShiftI _ #N))) #M) => v1 // The M and N values must satisfy ((-1 << N) & M) == 0. // Because the optimization might work for a non-constant // mask M, and because the AddX operands can come in either // order, we check for every operand order. Node* MulNode::AndIL_add_shift_and_mask(PhaseGVN* phase, BasicType bt) { Node* add = in(1); Node* mask = in(2); + if (add == NULL || mask == NULL) { + return NULL; + } + int addidx = 0; if (add->Opcode() == Op_Add(bt)) { + addidx = 1; + } else if (mask->Opcode() == Op_Add(bt)) { + mask = add; + addidx = 2; + add = in(addidx); + } + if (addidx > 0) { Node* add1 = add->in(1); Node* add2 = add->in(2); if (add1 != NULL && add2 != NULL) { if (AndIL_shift_and_mask(phase, add1, mask, bt, false)) { set_req_X(addidx, add2, phase); return this; } else if (AndIL_shift_and_mask(phase, add2, mask, bt, false)) { set_req_X(addidx, add1, phase); return this; } } } return NULL; }

rose00

This is a good change. I have made some stylistic suggestions in context, and also suggested some further inspection of reordered operands.

You could do a bit more with the same analysis by checking for AndNot as well as And operations. (AndNot is not in our IR, but you can look at complemented constant masks, and look for Xors that perform complement non-constant ones.) Wherever an And would return identically zero, the corresponding AndNot would return its unmasked operand unchanged. This would allow you to simplify (x << 2) & -4 to x, etc., with only a modest extension on the present analysis. Perhaps a followup RFE could cover this, or it could be folded in to this work with a few more "mode bits" in the function signatures.

Much of this analysis would be subsumed by a more general bitwise type analysis which would track each bit position in every integral type, whether it is constant zero or constant one. Such an analysis is straightforward to develop (extending the _hi and _lo fields with _up and _dn` fields), and the rules are math that any Hacker's Delight reader can develop. Although this is way too much to consider as a point fix, it will be useful at some point in the future, especially when we begin to optimize on types larger than 64 bits. For the record, you can see the math of bitwise inference in action in this test class (which contains formulas to infer up/dn masks from arithmetic expressions, and hi/lo bounds from bitwise ones):

http://cr.openjdk.java.net/~jrose/draft/BitTypes.zip

Extends the optimization of masked sums introduced in openjdk#6697 to cover constant values, which currently break the optimization. Such constant values arise in an expression of the following form, for example from MemorySegmentImpl#isAlignedForElement: (base + (index + 1) << 8) & 255 => MulNode (base + (index << 8 + 256)) & 255 => AddNode ((base + index << 8) + 256) & 255 Currently, "256" is not being recognized as a shifted value. This PR enables: ((base + index << 8) + 256) & 255 => MulNode (base + index << 8) & 255 => MulNode (PR openjdk#6697) base & 255

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 3, 2021

rwestrel changed the title ~~fix~~ 8277850: C2: optimize mask checks in counted loops Dec 3, 2021

openjdk bot changed the title ~~8277850: C2: optimize mask checks in counted loops~~ 8277850: C2: optimize mask checks in counted loops Dec 3, 2021

openjdk bot added the rfr Pull request is ready for review label Dec 3, 2021

vnkozlov reviewed Dec 4, 2021

View reviewed changes

vnkozlov reviewed Dec 6, 2021

View reviewed changes

TobiHartmann reviewed Dec 7, 2021

View reviewed changes

rwestrel force-pushed the JDK-8277850 branch from f716564 to d5a6c50 Compare December 7, 2021 17:33

vnkozlov reviewed Dec 7, 2021

View reviewed changes

openjdk-notifier bot changed the base branch from pr/6607 to master December 8, 2021 07:57

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Dec 8, 2021

rwestrel added 4 commits December 8, 2021 09:17

fix

4dacb05

whitespace

ab48aac

reviews

ea0464b

build fix

78b10f4

rwestrel force-pushed the JDK-8277850 branch from 314180a to 78b10f4 Compare December 8, 2021 08:17

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Dec 8, 2021

build fix

94b6869

TobiHartmann approved these changes Dec 8, 2021

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Dec 8, 2021

Merge branch 'master' into JDK-8277850

5e756f5

vnkozlov approved these changes Dec 8, 2021

View reviewed changes

openjdk bot closed this Dec 8, 2021

openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 8, 2021

rose00 reviewed Dec 8, 2021

View reviewed changes

mernst-github mentioned this pull request Dec 21, 2024

8346664: C2: Optimize mask check with constant offset #22856

Open

3 tasks

8277850: C2: optimize mask checks in counted loops #6697

8277850: C2: optimize mask checks in counted loops #6697

Conversation

rwestrel commented Dec 3, 2021 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Dec 3, 2021

openjdk bot commented Dec 3, 2021

mlbridge bot commented Dec 3, 2021 • edited Loading

Webrevs

vnkozlov commented Dec 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TobiHartmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TobiHartmann commented Dec 7, 2021

vnkozlov commented Dec 7, 2021

rwestrel commented Dec 7, 2021

vnkozlov left a comment

Choose a reason for hiding this comment

vnkozlov commented Dec 7, 2021

vnkozlov commented Dec 8, 2021

openjdk-notifier bot commented Dec 8, 2021

openjdk bot commented Dec 8, 2021

TobiHartmann left a comment

Choose a reason for hiding this comment

openjdk bot commented Dec 8, 2021 • edited Loading

vnkozlov left a comment

Choose a reason for hiding this comment

rwestrel commented Dec 8, 2021

rwestrel commented Dec 8, 2021

openjdk bot commented Dec 8, 2021

openjdk bot commented Dec 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rose00 left a comment

Choose a reason for hiding this comment

rwestrel commented Dec 3, 2021 •

edited by openjdk bot

Loading

mlbridge bot commented Dec 3, 2021 •

edited

Loading

openjdk bot commented Dec 8, 2021 •

edited

Loading