Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8277850: C2: optimize mask checks in counted loops #6697

Closed
wants to merge 6 commits into from

Conversation

rwestrel
Copy link
Contributor

@rwestrel rwestrel commented Dec 3, 2021

This is another fix that addresses a performance issue with panama and that was brought up by Maurizio. The pattern to optimize is:
if ((base + (offset << 2)) & 3) != 0) {
}

where base is loop independent but offset depends on a loop variable. This can be transformed to:

if ((base & 3) != 0) {

That check becomes loop independent and be optimized by loop predication (or I suppose loop unswitching but that wasn't the case of the micro benchmark I worked on).

This change also optimizes the pattern:

(offset << 2) & 3

to return 0.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8277850: C2: optimize mask checks in counted loops

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6697/head:pull/6697
$ git checkout pull/6697

Update a local copy of the PR:
$ git checkout pull/6697
$ git pull https://git.openjdk.java.net/jdk pull/6697/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6697

View PR using the GUI difftool:
$ git pr show -t 6697

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6697.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 3, 2021

👋 Welcome back roland! A progress list of the required criteria for merging this PR into pr/6607 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 3, 2021

@rwestrel The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 3, 2021
@rwestrel rwestrel changed the title fix 8277850: C2: optimize mask checks in counted loops Dec 3, 2021
@openjdk openjdk bot changed the title 8277850: C2: optimize mask checks in counted loops 8277850: C2: optimize mask checks in counted loops Dec 3, 2021
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 3, 2021
@mlbridge
Copy link

mlbridge bot commented Dec 3, 2021

Webrevs

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 4, 2021

@rwestrel there is issue with build on Windows:

d:\a\jdk\jdk\jdk\src\hotspot\share\opto\mulnode.cpp(1746): error C2220: the following warning is treated as an error
20
d:\a\jdk\jdk\jdk\src\hotspot\share\opto\mulnode.cpp(1746): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
21

return TypeInt::ZERO;
}

return MulINode::Value(phase);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no MulINode::value() or MulLNode::value(), only based class MulNode have it.
You missing initial check done by MulNode::value().
I suggest to move this code into AndINode::mul_ring() and AndLNode::mul_ring().
Also in mul_ring() you can pass r1->get_con() as mask value to AndIL_shift_and_mask().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing this.
MulNode::AndIL_shift_and_mask() needs the shift node so it can test for shift->Opcode() == Op_ConvI2L. mult_ring() only gets the Type* as input. Are you suggesting to change mul_ring's signature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I missed that.
It is unfortunate - we will have many duplicated checks.

@@ -1683,3 +1713,58 @@ const Type* RotateRightNode::Value(PhaseGVN* phase) const {
return TypeLong::LONG;
}
}

bool MulNode::AndIL_shift_and_mask(PhaseGVN* phase, Node* mask, Node* shift, BasicType bt) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass mask as value here.


Node* MulNode::AndIL_add_shift_and_mask(PhaseGVN* phase, BasicType bt) {
Node* in1 = in(1);
Node* in2 = in(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caller already determine that in2 is const int mask = t2->get_con(). You can pass it here.

@@ -1683,3 +1713,58 @@ const Type* RotateRightNode::Value(PhaseGVN* phase) const {
return TypeLong::LONG;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment here too which pattern it is looking for.


bool MulNode::AndIL_shift_and_mask(PhaseGVN* phase, Node* mask, Node* shift, BasicType bt) const {
if (mask == NULL || shift == NULL) {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check shift for TOP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code below:

const TypeInteger* shift_t = phase->type(shift)->isa_integer(bt);
  if (mask_t == NULL || shift_t == NULL) {

catches the case where shift is top, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It is type check.

return TypeInt::ZERO;
}

return MulINode::Value(phase);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I missed that.
It is unfortunate - we will have many duplicated checks.

@@ -82,6 +82,9 @@ class MulNode : public Node {
virtual int min_opcode() const = 0;

static MulNode* make(Node* in1, Node* in2, BasicType bt);

bool AndIL_shift_and_mask(PhaseGVN* phase, Node* mask, Node* shift, BasicType bt) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one could be static method.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I submitted testing.


/*
* @test
* @bug JDK-8277850
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be @bug 8277850 (not sure if that is a requirement but all other tests use that format).

* @run driver compiler.c2.irTests.TestShiftAndMask
*/

public class TestShiftAndMask {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!

@@ -675,7 +699,7 @@ Node *AndLNode::Ideal(PhaseGVN *phase, bool can_reshape) {
const jlong mask = t2->get_con();

Node* in1 = in(1);
uint op = in1->Opcode();
int op = in1->Opcode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got curious and checked the code, we have similar patterns everywhere. Filed JDK-8278328 to clean up this mess.

return false;
}

Node* MulNode::AndIL_add_shift_and_mask(PhaseGVN* phase, BasicType bt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment describing the pattern.

@TobiHartmann
Copy link
Member

I submitted testing.

Build on Windows failed:

[2021-12-07T09:08:24,657Z] t:\workspace\open\src\hotspot\share\opto\mulnode.cpp(1746): error C2220: the following warning is treated as an error
[2021-12-07T09:08:24,657Z] t:\workspace\open\src\hotspot\share\opto\mulnode.cpp(1746): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
[2021-12-07T09:08:24,703Z] lib/CompileJvm.gmk:141: recipe for target '/cygdrive/t/workspace/build/windows-x64/hotspot/variant-server/libjvm/objs/mulnode.obj' failed

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 7, 2021

@rwestrel Something went wrong with rebasing. It shows changes in 533 files.

@rwestrel
Copy link
Contributor Author

rwestrel commented Dec 7, 2021

@rwestrel Something went wrong with rebasing. It shows changes in 533 files.

Sorry about that. This PR depends on #6607 and that's what must have caused the problem. I rebased and force pushed. Hopefully this is fixed now.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Let me test it before approval.

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 7, 2021

Unfortunately I can't test these changes because they depend on #6607
We have to wait when that change is pushed.

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 8, 2021

I did the same as Tobias and applied #6607 changes first.
I still see Windows build failure Tobias pointed.

@openjdk-notifier openjdk-notifier bot changed the base branch from pr/6607 to master December 8, 2021 07:57
@openjdk-notifier
Copy link

The dependent pull request has now been integrated, and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork:

git checkout JDK-8262341
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk
Copy link

openjdk bot commented Dec 8, 2021

@rwestrel this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8277850
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Dec 8, 2021
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Dec 8, 2021
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me. Tests are running.

@openjdk
Copy link

openjdk bot commented Dec 8, 2021

@rwestrel This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8277850: C2: optimize mask checks in counted loops

Reviewed-by: kvn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

  • 3e93e0b: 8276769: -Xshare:auto should tolerate problems in the CDS archive
  • 79165b7: 8278324: Update the --generate-cds-archive jlink plugin usage message
  • 40d726b: 8278310: Improve logging in CDS DynamicLoaderConstraintsTest.java
  • e4852c6: 8277998: runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java#custom-cl-zgc failed "assert(ZAddress::is_marked(addr)) failed: Should be marked"
  • 37921e3: 8269258: java/net/httpclient/ManyRequestsLegacy.java failed with connection timeout

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 8, 2021
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build issue solved. Thanks.
Testing looks good too so far. It is still running but should be finished soon.

@rwestrel, please, push it today before RDP1 cut off.

@rwestrel
Copy link
Contributor Author

rwestrel commented Dec 8, 2021

@vnkozlov @TobiHartmann thanks for the reviews.

@rwestrel
Copy link
Contributor Author

rwestrel commented Dec 8, 2021

/integrate

@openjdk
Copy link

openjdk bot commented Dec 8, 2021

Going to push as commit 8af3b27.
Since your change was applied there have been 5 commits pushed to the master branch:

  • 3e93e0b: 8276769: -Xshare:auto should tolerate problems in the CDS archive
  • 79165b7: 8278324: Update the --generate-cds-archive jlink plugin usage message
  • 40d726b: 8278310: Improve logging in CDS DynamicLoaderConstraintsTest.java
  • e4852c6: 8277998: runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java#custom-cl-zgc failed "assert(ZAddress::is_marked(addr)) failed: Should be marked"
  • 37921e3: 8269258: java/net/httpclient/ManyRequestsLegacy.java failed with connection timeout

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Dec 8, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 8, 2021
@openjdk
Copy link

openjdk bot commented Dec 8, 2021

@rwestrel Pushed as commit 8af3b27.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

// patterns similar to (v << 2) & 3 to 0
// and
// patterns similar to (v1 + (v2 << 2)) & 3 transformed to v1 & 3
bool MulNode::AndIL_shift_and_mask(PhaseGVN* phase, Node* mask, Node* shift, BasicType bt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The operand order is distracting and perhaps confusing, since mask is in(2) and shift is in(1).
  2. The mask is not necessarily in(2), because it doesn't have to be a constant; it can be a bounded value.
  3. I wrote an expanded comment.
  4. There's a redundant NULL check.
  5. You should use a different name for the adjusted bt value.

Here are the changes I suggest to this function:

// Given an expression (AndX shift mask) or (AndX mask shift),
// determine if the AndX must always produce zero, because the
// the shift (x<<N) is bitwise disjoint from the mask #M.
// The X in AndX must be I or L, depending on bt.
// Specifically, the following cases fold to zero,
// when the shift value N is large enough to zero out
// all the set positions of the and-mask M.
//   (AndI (LShiftI _ #N) #M) => #0
//   (AndL (LShiftL _ #N) #M) => #0
//   (AndL (ConvI2L (LShiftI _ #N)) #M) => #0
// The M and N values must satisfy ((-1 << N) & M) == 0.
// Because the optimization might work for a non-constant
// mask M, we check the AndX for both operand orders.
bool MulNode::AndIL_shift_and_mask(PhaseGVN* phase, Node* shift, Node* mask, BasicType bt, bool check_reverse) const {
  if (mask == NULL || shift == NULL) {
    return false;
  }
  const TypeInteger* mask_t = phase->type(mask)->isa_integer(bt);
  const TypeInteger* shift_t = phase->type(shift)->isa_integer(bt);
  if (mask_t == NULL || shift_t == NULL) {
    return false;
  }
+ BasicType shift_bt = bt;
+ if (bt == T_LONG && shift->Opcode() == Op_ConvI2L) {
    Node* val = shift->in(1);
    if (val == NULL) {
      return false;
    }
+   if (val->Opcode() == Op_LShiftI) {
+     shift_bt = T_INT;
+     shift = val;
+   }
  }
#s/bt/shift_bt/
  if (shift->Opcode() != Op_LShift(shift_bt)) {
+   if (check_reverse &&
+       (mask->Opcode() == Op_LShift(bt) ||
+        (bt == T_LONG && mask->Opcode() == Op_ConvI2L))) {
+     // try it the other way around
+     return AndIL_shift_and_mask(phase, shift, mask, bt, false);
+   }
    return false;
  }
  Node* shift2 = shift->in(2);
  if (shift2 == NULL) {
    return false;
  }
  const Type* shift2_t = phase->type(shift2);
  if (!shift2_t->isa_int() || !shift2_t->is_int()->is_con()) {
    return false;
  }

#s/bt/shift_bt/
  jint shift_con = shift2_t->is_int()->get_con() & ((shift_bt == T_INT ? BitsPerJavaInteger : BitsPerJavaLong) - 1);
  if ((1L << shift_con) > mask_t->hi_as_long() && mask_t->lo_as_long() >= 0) {
    return true;
  }

  return false;
}

Suggest signature change:


// Helper method to transform:
// patterns similar to (v1 + (v2 << 2)) & 3 to v1 & 3
Node* MulNode::AndIL_add_shift_and_mask(PhaseGVN* phase, BasicType bt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments for this function also.

// Given an expression (AndX (AddX v1 (LShiftX v2 #N)) #M)
// determine if the AndX must always produce (AndX v1 #M),
// because the shift (v2<<N) is bitwise disjoint from the mask #M.
// The X in AndX will be I or L, depending on bt.
// Specifically, the following cases fold,
// when the shift value N is large enough to zero out
// all the set positions of the and-mask M.
//   (AndI (AddI v1 (LShiftI _ #N)) #M) => v1
//   (AndL (AddI v1 (LShiftL _ #N)) #M) => v1
//   (AndL (AddL v1 (ConvI2L (LShiftI _ #N))) #M) => v1
// The M and N values must satisfy ((-1 << N) & M) == 0.
// Because the optimization might work for a non-constant
// mask M, and because the AddX operands can come in either
// order, we check for every operand order.
Node* MulNode::AndIL_add_shift_and_mask(PhaseGVN* phase, BasicType bt) {
  Node* add  = in(1);
  Node* mask = in(2);
+ if (add == NULL || mask == NULL) {
+   return NULL;
+ }
+ int addidx = 0;
  if (add->Opcode() == Op_Add(bt)) {
+   addidx = 1;
+ } else if (mask->Opcode() == Op_Add(bt)) {
+   mask = add;
+   addidx = 2;
+   add = in(addidx);
+ }
+ if (addidx > 0) {
    Node* add1 = add->in(1);
    Node* add2 = add->in(2);
    if (add1 != NULL && add2 != NULL) {
      if (AndIL_shift_and_mask(phase, add1, mask, bt, false)) {
        set_req_X(addidx, add2, phase);
        return this;
      } else if (AndIL_shift_and_mask(phase, add2, mask, bt, false)) {
        set_req_X(addidx, add1, phase);
        return this;
      }
    }
  }
  return NULL;
}

Copy link
Contributor

@rose00 rose00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change. I have made some stylistic suggestions in context, and also suggested some further inspection of reordered operands.

You could do a bit more with the same analysis by checking for AndNot as well as And operations. (AndNot is not in our IR, but you can look at complemented constant masks, and look for Xors that perform complement non-constant ones.) Wherever an And would return identically zero, the corresponding AndNot would return its unmasked operand unchanged. This would allow you to simplify (x << 2) & -4 to x, etc., with only a modest extension on the present analysis. Perhaps a followup RFE could cover this, or it could be folded in to this work with a few more "mode bits" in the function signatures.

Much of this analysis would be subsumed by a more general bitwise type analysis which would track each bit position in every integral type, whether it is constant zero or constant one. Such an analysis is straightforward to develop (extending the _hi and _lo fields with _up and _dn` fields), and the rules are math that any Hacker's Delight reader can develop. Although this is way too much to consider as a point fix, it will be useful at some point in the future, especially when we begin to optimize on types larger than 64 bits. For the record, you can see the math of bitwise inference in action in this test class (which contains formulas to infer up/dn masks from arithmetic expressions, and hi/lo bounds from bitwise ones):

http://cr.openjdk.java.net/~jrose/draft/BitTypes.zip

mernst-github added a commit to mernst-github/jdk that referenced this pull request Dec 21, 2024
Extends the optimization of masked sums introduced in openjdk#6697 to cover constant values, which currently break the optimization.

Such constant values arise in an expression of the following form, for example from MemorySegmentImpl#isAlignedForElement:

(base + (index + 1) << 8) & 255
=> MulNode
(base + (index << 8 + 256)) & 255
=> AddNode
((base + index << 8) + 256) & 255

Currently, "256" is not being recognized as a shifted value. This PR enables:

((base + index << 8) + 256) & 255
=> MulNode
(base + index << 8) & 255
=> MulNode (PR openjdk#6697)
base & 255
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants