Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save/load i2c2i adapters #25

Open
wants to merge 33 commits into
base: premain
Choose a base branch
from

Conversation

ashu-mehra
Copy link
Collaborator

@ashu-mehra ashu-mehra commented Oct 23, 2024

This is an attempt to save and load i2c2i adapters along with the adapter handler table.
There are mainly two parts to this change:

  1. Storing of adapter code in the SCCache or AOT code cache.
  2. Storing of adapter handler table in the AOT cache.

Adapter handler table is a map from AdapterFingerPrint to AdapterHnadlerEntry. To store them in AOT cache, AdapterFingerPrint and AdapterHandlerEntry are updated to MetaspaceObj. Both these entities are discovered and added to the cache while processing the Method. When storing the adapter handler table, only the entries that have already been archived are considered. This allows pruning of AdapterHnadlerEntry that may be only reachable through a Method that is not eligible to be archived.

An AdapterHandlerEntry has pointer to the adapter code. Because the AdapterHandlerEntry and the adapter code are stored in separate archives, this link between the AdapterHandlerEntry and the adapter code needs to be removed (see AdapterHandlerEntry::remove_unshareable_info()).
During the production run, as the methods in the AOT cache are adopted, the AdapterHandlerEntry is linked back to the adapter code (see AdapterHandlerEntry::restore_unshareable_info).

All this code is guarded by -XX:[+-]ArchiveAdapters option which defaults to false, but is set to true in CDSConfig during the assembly phase.

Other changes worth mentioning:

  1. Changes to the SCCache infrastructure to make it possible to store and load adapter code. (Thanks to @adinn)
  2. Updating AdapterFingerPrint hashing algorithm to avoid collisions. If there is any collision, then it will prevent finding the adapter code in the SCCache. (Again courtesy of @adinn)

Thanks to @adinn for providing many of these changes.

Performance:
-Xlog:init shows time taken for linking of Methods and making adapters. An example output is:

ClassLoader:
  clinit:                             150us / 4612 events
  link methods:                     28980us / 176893 events
  method adapters:                  15378us / 697 events

Save/load of adapters seem to have improved these stats.

Quarkus -ArchiveAdapters +ArchiveAdapters
link methods 12214us / 58913 events 2700us / 58913 events
method adapters 7793us / 607 events 4402us / 38 events
Spring-petclinic -ArchiveAdapters +ArchiveAdapters
link methods 28980us / 176893 events 7485us / 176893 events
method adapters 15378us / 697 events 7050us / 13 events

However, testing with Quarkus app, I don't see any noticeable improvement in the startup time.


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/leyden.git pull/25/head:pull/25
$ git checkout pull/25

Update a local copy of the PR:
$ git checkout pull/25
$ git pull https://git.openjdk.org/leyden.git pull/25/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25

View PR using the GUI difftool:
$ git pr show -t 25

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/leyden/pull/25.diff

Using Webrev

Link to Webrev Comment

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
…ent ranges

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
loading from AOT cache

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
…_info()

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@bridgekeeper
Copy link

bridgekeeper bot commented Oct 23, 2024

👋 Welcome back asmehra! A progress list of the required criteria for merging this PR into premain will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 23, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 24, 2024
@mlbridge
Copy link

mlbridge bot commented Oct 24, 2024

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Copy link
Collaborator

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice changes. Few comments.

src/hotspot/share/code/SCCache.cpp Outdated Show resolved Hide resolved
src/hotspot/share/code/SCCache.cpp Outdated Show resolved Hide resolved
src/hotspot/share/code/SCCache.cpp Outdated Show resolved Hide resolved
Comment on lines 1269 to 1271
// TODO: how to identify code cache full situation now that the adapter() can be
// non-null if AOT cache is in use
#if 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check next (opposite to check in the following assert)?:

 if (adapter() != nullptr && !adapter()->is_linked()) {

The assumption is that we have enough CodeCache when we loading adapters from APT cache. Otherwise we should bailout (did you test such case?).

Is is_linked() is specific for adapters from AOT cache?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_linked() is being set for every AdapterHandlerEntry when its code is either generated or loaded from AOT cache.

Regarding the original block of code that this check pertains to:

  // If the code cache is full, we may reenter this function for the
  // leftover methods that weren't linked.
  if (adapter() != nullptr) {
    return;
  }

The comment seem to indicate that we may reenter this function for a Method* for which adapter code has already been generated. However I am not able to trace the code path that may result in re-entering this function. Can you please explain under what conditions is this possible? @vnkozlov

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was added for JDK-7033141
https://hg.openjdk.org/jdk9/jdk9/hotspot/rev/d3b9f2be46ab

I think the comment is incorrect. It should talk about PermGen space based on bug's evaluation:

"If the VM runs out of permgen space while allocating the constant pool cache, it tries to reverify the bytecodes in the methods for the class. But the bytecodes have been rewritten. I'm working on a fix that un-rewrites the bytecodes so that the VM can try again to link this class. I am debugging this now - actually I'm debugging my code that forces the error condition (for testing) since this but only reproduces for a specific error condition.
It's not very unlikely for an application to run out of permgen (or code cache as in bug 6947901) so it is probably worth fixing for jdk 7. The fix is relatively low risk once it's debugged."

JDK-6947901 shows failure with -Xint too.

But I imaging that full CodeCache may also cause failure to create adapters which will cause "un-rewrites" bytecode.

We don't have PermGen anymore. The only issue is space in CodeCache for adapters. Which you can check before loading adapters since you know size of adapters code in AOT cache.

I don't think we currently check that CodeCache size is the same during product run as during AOT Assembly phase. Adapters are allocated in NonNMethod section.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vnkozlov I added a change to fix this by checking if adapter is shared or not. If it is not shared and is not null, we return, else we continue. This should restore the behavior of returning early if link_method() gets called again due to code cache full.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@openjdk
Copy link

openjdk bot commented Nov 12, 2024

@ashu-mehra this pull request can not be integrated into premain due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout premain-save-i2c2i-v3
git fetch https://git.openjdk.org/leyden.git premain
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge premain"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Nov 12, 2024
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@openjdk openjdk bot removed the rfr Pull request is ready for review label Dec 11, 2024
@openjdk openjdk bot added rfr Pull request is ready for review and removed merge-conflict Pull request has merge conflict with target branch labels Dec 17, 2024
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@ashu-mehra
Copy link
Collaborator Author

@vnkozlov @iklam is this good to go now? Any more comments before I merge it?

@vnkozlov
Copy link
Collaborator

vnkozlov commented Jan 6, 2025

Is this up to date with current premain branch code?

@ashu-mehra
Copy link
Collaborator Author

@vnkozlov yes, I have merged premain into this PR"s branch. premain is now ahead of this by just two commits which were done yesterday.

@vnkozlov
Copy link
Collaborator

vnkozlov commented Jan 7, 2025

@iklam can you run this patch through our internal testing?

@iklam
Copy link
Member

iklam commented Jan 7, 2025

@iklam can you run this patch through our internal testing?

OK I will do it.

@iklam
Copy link
Member

iklam commented Jan 7, 2025

@iklam can you run this patch through our internal testing?

OK I will do it.

I am seeing new failures on aarch64 only. x64 seems fine:

runtime/cds/appcds/applications/JavacBench.java#leyden			macosx-aarch64-debug
runtime/cds/appcds/applications/MicronautFirstApp.java#leyden		macosx-aarch64-debug
runtime/cds/appcds/applications/MicronautFirstApp.java#leyden		linux-aarch64-open
runtime/cds/appcds/applications/MicronautFirstApp.java#leyden		linux-aarch64-debug
runtime/cds/appcds/applications/QuarkusGettingStarted.java#leyden       linux-aarch64-debug

Here's the hs_err. This happens in the final production run.

#  SIGSEGV (0xb) at pc=0x0000ffff89976650, pid=3238003, tid=3238006
#
# JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2025-01-07-1758562.ioi.lam.le4)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2025-01-07-1758562.ioi.lam.le4, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# v  ~AdapterBlob 0x0000ffff83d1b0d8

---------------  S U M M A R Y ------------

Command Line: -XX:MaxRAMPercentage=6.25 -Dtest.boot.jdk=/opt/mach5/mesos/work_dir/jib-master/install/jdk/23/37/bundles/linux-aarch64/jdk-23_linux-aarch64_bin.tar.gz/jdk-23 -Djava.io.tmpdir=/opt/mach5/mesos/work_dir/slaves/b733f181-520a-4536-86fc-7df55263c942-S3131/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/4ab6604d-6790-4797-a358-fcfe3a5e2cfe/runs/b53804f1-a345-47c7-9bed-24313dccd140/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/tmp -DautoQuit=true -Dmicronaut.server.port=0 -XX:+IgnoreUnrecognizedVMOptions -XX:-VerifyDependencies -XX:+UnlockDiagnosticVMOptions -XX:VerifyArchivedFields=2 -Xlog:cds:file=MicronautFirstApp.production.log::filesize=0 -XX:CacheDataStore=MicronautFirstApp.cds -Xlog:scc=error example.micronaut.Application

@ashu-mehra
Copy link
Collaborator Author

@iklam I have tried reproducing these failures on an linux-aarch64 (fedora 40) system but the tests always pass. I have run them multiple times using fastdebug and release build but didn't get any failure.
Are these failure reproducible every time in your testing, or are they intermittent? Which Linux distro were this tests run on? Also, can you share the backtrace for the crash.

@iklam
Copy link
Member

iklam commented Jan 8, 2025

@iklam I have tried reproducing these failures on an linux-aarch64 (fedora 40) system but the tests always pass. I have run them multiple times using fastdebug and release build but didn't get any failure.

Are these failure reproducible every time in your testing, or are they intermittent? Which Linux distro were this tests run on? Also, can you share the backtrace for the crash.

I got those crashes from our CI pipeline. Let me try to run the tests manually on linux-aarch hosts and see if I can reproduce the problems.

table.
Do not call delete on AdapterHandlerEntry. Instead call the destructor
explicitly through a deallocate() method.

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
and main thread generating adapters

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
CodeBlob

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@openjdk openjdk bot removed the rfr Pull request is ready for review label Jan 10, 2025
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants