Skip to content

Conversation

@merykitty
Copy link
Member

@merykitty merykitty commented Dec 13, 2025

Hi,

This patch is an alternative to #28764 but it does the analysis during IGVN instead.

The current PR:

The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return.

This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at find_previous_store, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store.

I do not see a noticeable difference in C2 runtime with and without this patch.

Future work:

  1. Nested object:

Consider this case:

Holder h = new Holder();
Object o = new Object();
h.o = o;

Currently, o will be considered escaped at h.o = o. However, it can be seen that o has not actually escaped because h has not escaped. Luckily, with the current approach, this can be easily achieved, notice how this loop is just "if anything escapes, consider base escapes", currently, the "anything" here includes base and its aliases. if we include the base of the object at which o is stored, then we can correctly determine if o has escaped.

// Find all nodes that may escape alloc, and decide that it is provable that they must be
// executed after ctl
EscapeStatus res = NOT_ESCAPED;
aliases.push(base);
for (uint idx = 0; idx < aliases.size(); idx++) {
  Node* n = aliases.at(idx);
  1. Fold a memory Phi.

This is pretty straightforward. We need to create a value Phi for each memory Phi so that we can handle loop Phis.

  1. Fold a pointer Phi.

This can be easy, just give up if we don't encounter a store into that Phi. However, we can do better. Consider this case:

Point p1 = new Point;
Point p2 = new Point;
p1.x = v1;
p2.x = v2;
Point p = Phi(p1, p2);
int a = p.x;

Then, a should be able to be folded to Phi(v1, v2) if p1 and p2 are known not to alias.

Another interesting case:

Point p = Phi(p1, p2);
p.x = v;
p1.x = v1;
int a = p.x;

Then, theoretically, we can fold a to Phi(v1, v) if p1 and p2 are known not to alias.

Please take a look and leave your thoughts, thanks a lot.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8373495: C2: Aggressively fold loads from objects that have not escaped (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812
$ git checkout pull/28812

Update a local copy of the PR:
$ git checkout pull/28812
$ git pull https://git.openjdk.org/jdk.git pull/28812/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28812

View PR using the GUI difftool:
$ git pr show -t 28812

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28812.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 13, 2025

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 13, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 13, 2025
@openjdk
Copy link

openjdk bot commented Dec 13, 2025

@merykitty The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 13, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 13, 2025

@iwanowww
Copy link
Contributor

Very nice! I definitely prefer the approach here to #28764.

I see that the unit test stays the same and there's an adjustment in some other test, so I assume this version is functionally more powerful than #28764 version.

Have you had a chance to measure how much it affects compilation speed compared to #28764?

(The code is dense and hard to reason about, so some polishing/refactoring to make it more readable. Also, please, think about verification checks.)

@openjdk openjdk bot removed the rfr Pull request is ready for review label Dec 13, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 13, 2025
@merykitty
Copy link
Member Author

@iwanowww Thanks for your comment. I have added a lot more comments to explain in detail the steps of MemNode::find_previous_store. I have also made a small modification: instead of traversing the outputs of the control nodes from the call to the allocation, we traverse the outputs of the nodes that may alias base instead. This has some benefits:

  • It is likely cheaper. This is because there are often few nodes that may alias base, while there may be numerous control nodes from the call to the allocation. The number of nodes that directly use a pointer is also less than the number of nodes that directly use a random control node.
  • It is more conservative. This is because we can limit the type of the outputs of a pointer and be conservative with everything else, while exhaustively checking if a random use of a random control node makes base escape seems hard.

I have also added some verification that if a step determines that base does not escape, then the following steps must not determine otherwise.

For the runtime cost, I don't see a noticeable difference compared to master.

For the unit test, compared to the previous PR, I have removed the failOn = LoadI from the tests that involve loops. But I think improving load folding on Phi can be another PR. For the change in TestZGCEffectiveBarrierElision, it is because I decided to add Blackhole to the list of nodes that do not escape an object, not sure if it is necessary, though. However, I managed to change the test so the load is not elided.

@merykitty
Copy link
Member Author

I have made further changes that I believe have made the change pretty rigorous, I don't think I can see any flaw in the reasoning that allows mis-analysis now.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable at first glance but I need more time to go through.

@dlunde
Copy link
Member

dlunde commented Dec 17, 2025

Looks interesting @merykitty! I will also review this.

@merykitty
Copy link
Member Author

I have added a section describing some future work based on this PR that I have come up with.

@robcasloz
Copy link
Contributor

@merykitty would it be possible to guard the logic added by this patch with a new diagnostic flag, to facilitate reviewing and experimenting?

@merykitty
Copy link
Member Author

@robcasloz Done, is it good for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

5 participants