8373495: C2: Aggressively fold loads from objects that have not escaped #28764

merykitty · 2025-12-11T09:10:30Z

Hi,

The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return.

This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes.

For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2.

Please take a look and leave your thoughts, thanks a lot.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8373495: C2: Aggressively fold loads from objects that have not escaped (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28764/head:pull/28764
$ git checkout pull/28764

Update a local copy of the PR:
$ git checkout pull/28764
$ git pull https://git.openjdk.org/jdk.git pull/28764/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28764

View PR using the GUI difftool:
$ git pr show -t 28764

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28764.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-12-11T09:12:44Z

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-12-11T09:12:53Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-12-11T09:14:13Z

@merykitty The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-12-11T09:17:52Z

Webrevs

robcasloz · 2025-12-11T10:14:41Z

Interesting improvement, thanks for working in this area, Quan Anh! Please allow us some time to think thoroughly about it and how it relates to other plans to improve escape analysis and scalar replacement in C2.

merykitty · 2025-12-11T11:15:37Z

@robcasloz Thanks for taking a look. I also wonder how this relates to other potential improvements to EA. I think that this can work as an independent step or as a first step toward those goals. I am also pretty excited to realize that we don't need to schedule the graph to know if a load can be folded in such a manner, hope this can also be useful.

vnkozlov · 2025-12-11T18:44:47Z

src/hotspot/share/opto/phaseloadfolding.cpp

+//   In this case, even if the load x = o.value is declared after the store of o to p that allows o
+//   to escape, it is valid for the load to actually happen before the store. As a result, we can


I don't think it is correct. If p is external other thread can modify its fields concurrently.
Or are you saying that if p is external we will always have memory barrier?

I think the Java memory model allows this reordering and places the responsibility on the programmer to use a synchronization mechanism if the reordering is undesirable, no?

Yes, I think. May be we should add comment about that.

I have added comments to further stress the importance of memory barriers if the developer needs the accesses serialized.

vnkozlov · 2025-12-11T21:25:11Z

src/hotspot/share/opto/phaseloadfolding.cpp

+//              Phi
+//   We can see that the object can be considered non-escape at NarrowMemProj, CallJava(null), and
+//   Proj2, while it is considered escape at CallJava(o), Proj1, Phi. The loads x and z will be
+//   from NarrowMemProj and Proj2, respectively, which means they can be considered loads from an


So this optimization is based on JDK-8327963 changes which introduced NarrowMemProj. But I don't see you can for it in code.

This is only for demonstration based on the current shape of the graph. Implementation-wise, we walk the graph until we meet an InitializeNode, at that point we call InitializeNode::find_captured_store, so you can say it is not important what kind of Proj an InitializeNode has.

iwanowww · 2025-12-12T02:43:46Z

Interesting idea, Quan!

Why can't the same be done as part of MemNode::can_see_stored_value()? Based on your reasoning in the comments, if the base escapes, the walk over memory graph happening there should encounter it as well. (But you need to ensure it climbs up to the Allocation node to be sure.)

merykitty · 2025-12-12T02:50:48Z

@iwanowww No, the walk over the memory graph only visits the memory nodes in the alias class of the load, the escape can happen in a different alias class and be made visible by a MergeMem. For example:

Integer o = new Integer(v);
*p = o;
VarHandle.fullFence();
int x = o.value;

Then the load is in the alias class Integer.value, while the escape is in the alias class p and is made visible by the full fence.

iwanowww · 2025-12-12T03:03:59Z

Yes, I got it, but my understanding of the core idea of the optimization is that you can skip over membars when base object is not escaped yet. So, if MemNode::can_see_stored_value() encounters a node with a wide memory effect (a membar or a call) while traversing the memory graph upwards, it can step over it if it can prove that the freshly allocated instance hasn't escaped yet. And the traversal of memory graph from there up to corresponding Initialize node should reveal whether the instance escaped or not. Do I get it right?

merykitty · 2025-12-12T03:15:45Z

@iwanowww In principle, I think you are right. However, I don't know how you can prove that a freshly allocated object has not escaped. It seems to me you would need to traverse the whole memory graph to obtain that information. Furthermore, AllocateNode and InitializeNode do not write to the whole memory, so walking some memory alias classes will make you pass them without knowing until you encounter the start memory. And there is also the issue of other nodes may alias with the base, too (e.g. Phi).

iwanowww · 2025-12-12T03:33:05Z

Ok, what does it take to determine a freshly allocated object doesn't escape in a region bounded by the allocation and some call/membar node (dominated by it)? I believe it should be part of the problem your patch solves.

merykitty · 2025-12-12T03:56:47Z

The sufficient condition to decide that a freshly allocated object does not escape in a region bounded by the allocation and a call is that there is no action in that region that makes the object escape. This means that there is no node that escapes the object which has the call as a transitive use.

As a result, my solution here is to find all nodes that escape the object, then mark all of its transitive uses as escape. I believe you want to do it in the opposite way, that is, to try to find the nodes that escape the freshly allocated object from a call. But that means that we need to traverse all the transitive inputs of the call, which seems unrealistic for something running in IterGVN. Am I understanding it correctly?

vnkozlov · 2025-12-12T19:48:58Z

@merykitty thank you for updating comment.

Do you have any performance numbers for some well known benchmarks?

iwanowww · 2025-12-12T22:19:32Z

Some more thoughts/ideas:

So, an object can escape either through a store to memory or as an argument to a call. (Any other scenarios?)

If we leave memory graph considerations aside, then traversing control graph from a barrier (call/membar) up to allocation should enumerate all calls and stores in that range. (All stores have control.)

(Theoretically, a store control can end up higher in the control graph, but I don't think it happens in practice.)

If a call/store has a data dependency on the allocation, then it's an escaping point.

One case left is the following: if a store has a control in the region, it can be scheduled after the region unless the store dominates the barrier in the memory graph. But, conservatively, it can also be treated as an escape point interfering with the access being optimized.

So, either doing CFG-only or CFG+memory traversal (plus, data inputs traversal on arguments) should detect whether there's an interfering escape point present or not.

Do you see any flaws in my reasoning?

Speaking of the associated costs, it doesn't look prohibitively expensive. The search is localized and doesn't involve traversal of the whole graph.

Alternatively, results of previous analysis requests can be cached. The property changes monotonically: a previously non-escaping case can't turn into escaping one later. If a cache is not invalidated, than the worst case is an optimization opportunity is missed.

iwanowww · 2025-12-12T22:19:46Z

Speaking of the general approach, if analysis part turns out to be way too
expensive for IGVN, I'd still prefer to have the analysis and transformation to be
separated and IGVN used to conduct the actual IR changes.

There's already some duplication and divergence between IGVN & PhaseLoadFolding
implementation. Without proper care, it can easily get worse in the future.

Another thing to consider: it's beneficial to perform such transformation
earlier, as IGVN case illustrates. (For example, by the time EA kicks in,
inlining is over.) Shared implementation is easier to maintain and reuse.

merykitty · 2025-12-13T02:50:58Z

@iwanowww Thanks for your analysis, I think it is possible to do the transformation during IGVN and have created another PR which follows that approach, could you take a look?

iwanowww · 2025-12-13T03:54:47Z

src/hotspot/share/opto/phaseloadfolding.cpp

+  auto extract_store_value = [&](StoreNode* store) {
+    assert(store->Opcode() == candidate->store_Opcode(), "must match %s - %s", store->Name(), candidate->Name());
+    Node* res = store->in(MemNode::ValueIn);
+    if (candidate->Opcode() == Op_LoadUB) {


Is such adaptation needed? MemNode::can_see_stored_value() solves a similar task, but it doesn't perform any adaptation.

Yes, it only looks for a matching store, the one doing the normalization is Load[B|US|S|US]Node::Ideal.

merykitty · 2025-12-14T00:05:04Z

Close in favour of #28812

Aggressively fold loads from objects that have not escaped

9658bde

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 11, 2025

openjdk bot added the rfr Pull request is ready for review label Dec 11, 2025

Some runtime calls may receive a derived pointer but not the base

f558c90

merykitty added 2 commits December 11, 2025 11:57

Just use candidate_set directly

1e260a8

store values need normalizing

045cd4a

vnkozlov reviewed Dec 11, 2025

View reviewed changes

more detailed explanations

2ca6bac

merykitty added 2 commits December 12, 2025 12:06

grammar, safe change

fd0f86c

Merge branch 'master' into foldmem

6331d47

merykitty mentioned this pull request Dec 13, 2025

8373495: C2: Aggressively fold loads from objects that have not escaped #28812

Open

3 tasks

iwanowww reviewed Dec 13, 2025

View reviewed changes

merykitty closed this Dec 14, 2025

		// In this case, even if the load x = o.value is declared after the store of o to p that allows o
		// to escape, it is valid for the load to actually happen before the store. As a result, we can

8373495: C2: Aggressively fold loads from objects that have not escaped #28764

8373495: C2: Aggressively fold loads from objects that have not escaped #28764

Uh oh!

Conversation

merykitty commented Dec 11, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Dec 11, 2025

Uh oh!

openjdk bot commented Dec 11, 2025

Uh oh!

openjdk bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

robcasloz commented Dec 11, 2025

Uh oh!

merykitty commented Dec 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iwanowww commented Dec 12, 2025

Uh oh!

merykitty commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iwanowww commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merykitty commented Dec 12, 2025

Uh oh!

iwanowww commented Dec 12, 2025

Uh oh!

merykitty commented Dec 12, 2025

Uh oh!

vnkozlov commented Dec 12, 2025

Uh oh!

iwanowww commented Dec 12, 2025

Uh oh!

iwanowww commented Dec 12, 2025

Uh oh!

merykitty commented Dec 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merykitty commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

merykitty commented Dec 11, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Dec 11, 2025 •

edited

Loading

mlbridge bot commented Dec 11, 2025 •

edited

Loading

merykitty commented Dec 12, 2025 •

edited

Loading

iwanowww commented Dec 12, 2025 •

edited

Loading