Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable per-connection JITServer AOT cache disabling #19133

Merged
merged 3 commits into from
Mar 18, 2024

Conversation

cjjdespres
Copy link
Contributor

@cjjdespres cjjdespres commented Mar 13, 2024

If a JITServer client uses -XX:+JITServerAOTCacheIgnoreLocalSCC, servers that client connects to will now

  1. Fail a compilation completely if the client requests an AOT cache store and the server cannot set up the compilation as an AOT cache store or load, and

  2. Report the state of the server's AOT cache if (1) occurs, so the client will not continue to send AOT cache requests to that server if those requests could not possibly be fulfilled.

The client tracks the state of the current server's AOT cache with the persistent info properties doNotRequestJITServerAOTCacheStore() and doNotRequestJITServerAOTCacheLoad(). Since a particular server's cache will never become able to complete AOT cache loads or stores partway through a connection (e.g., the server's cache never shrinks in size, so AOT cache stores will never succeed once it is full) these properties remain true once set for the duration of the connection with a particular server. They are reset whenever a client connects to a new server.

These changes are necessary to prevent a crash at the client from occurring if the client is running with -XX:+JITServerUseAOTCache -XX:+JITServerAOTCacheIgnoreLocalSCC and connects to a server that is not running -XX:+JITServerUseAOTCache (or, more generally, has an AOT cache in which methods or records cannot be stored).

This PR also fixes a bug that prevented the deserializer from being reset when a client connected to a new server after a previous disconnection.

Related: #18990

@cjjdespres
Copy link
Contributor Author

cjjdespres commented Mar 13, 2024

Attn @mpirvu. Unfortunately, while I do reset the two _doNotRequest... flags in the persistent info when connecting to a new server, that doesn't seem to do anything - if the client first connects to a server without an AOT cache, then connects to a server with an AOT cache, the client will still never request any AOT cache stores or loads no AOT cache loads or stores ever seem to occur at the server. My simple testing strategy was to start a JITServer instance without an AOT cache, then start an acmeair client, then immediately stop the running JITServer instance and start a new one with an AOT cache.

This behaviour does seem to be present even when using a local SCC, though (that is, with -XX:-JITServerAOTCacheIgnoreLocalSCC). Maybe my test is inadequate? Either that or the lack of necessary records in the new server's cache prevents AOT cache stores from occurring, or there is a flag at the client I'm forgetting about that needs to be cleared when it connects to a new server.

All that being said, these changes do fix the crash I mentioned in the comment above. It's just that resetting the two flags doesn't seem to do anything at the moment.

@cjjdespres
Copy link
Contributor Author

I said above that "the client never requests any AOT cache loads or stores", but I don't think I know that, actually. I'll add a log message to see whether or not the client requests any such compilations during the second connection.

@cjjdespres
Copy link
Contributor Author

A correction - with -XX:-JITServerAOTCacheIgnoreLocalSCC the client will in fact send AOT cache store/load requests to a new server. I was setting a command line option incorrectly when running with that option.

@cjjdespres
Copy link
Contributor Author

I understand now. This block:

         // This is a connection to a new server after a previous disconnection
         if ((0 != previousUID) && (previousUID != serverUID))
            {
            // Reset AOT deserializer (cached serialization records are now invalid)
            auto deserializer = compInfo->getJITServerAOTDeserializer();
            if (deserializer)
               deserializer->reset(compInfoPT);

            // Do not forbid AOT cache stores or loads (this server might be able to fulfill them)
            compInfo->getPersistentInfo()->setDoNotRequestJITServerAOTCacheLoad(false);
            compInfo->getPersistentInfo()->setDoNotRequestJITServerAOTCacheStore(false);
            }

is not being run, I think because here:

// For read failures we should not set the server availbility to false
if (connectionFailure)
{
compInfo->getPersistentInfo()->setServerUID(0);
_serverAvailable = false;
}

we set the the server UID to zero. If this setServerUID(0) call is necessary, then I think we'll have to track whether or not we've already connected to a server as a separate flag, since we can't rely on the UID for that purpose. Either that or we can just unconditionally reset the deserializer and flags on a new connection.

That also means we're not currently clearing the deserializer, because the current code assumes that the server UID being zero implies that the new connection is the first connection.

@mpirvu mpirvu self-assigned this Mar 13, 2024
@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Mar 13, 2024
@cjjdespres
Copy link
Contributor Author

Okay, I've added a flag that tracks if the client has ever connected to a server. That seems to fix the issue above, and fixes the deserializer reset problem as well.

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small suggestion. Looks good otherwise

runtime/compiler/env/J9PersistentInfo.hpp Outdated Show resolved Hide resolved
@cjjdespres cjjdespres force-pushed the handle-cache-unavailability branch 3 times, most recently from 46b0284 to 7a7f249 Compare March 14, 2024 20:45
@mpirvu
Copy link
Contributor

mpirvu commented Mar 14, 2024

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk17

@mpirvu
Copy link
Contributor

mpirvu commented Mar 15, 2024

Build failures unrelated to his PR:

18:05:38  /home/jenkins/workspace/Build_JDK17_x86-64_linux_jit_Personal/openj9/runtime/gc_modron_standard/RootScannerReadBarrierVerifier.cpp: In member function 'virtual void MM_RootScannerReadBarrierVerifier::scanClass(MM_EnvironmentBase*)':
18:05:38  /home/jenkins/workspace/Build_JDK17_x86-64_linux_jit_Personal/openj9/runtime/gc_modron_standard/RootScannerReadBarrierVerifier.cpp:54:23: error: unused variable 'omrVMThread' [-Werror=unused-variable]
18:05:38     54 |         OMR_VMThread *omrVMThread = env->getOmrVMThread();
18:05:38        |                       ^~~~~~~~~~~
18:05:38  /home/jenkins/workspace/Build_JDK17_x86-64_linux_jit_Personal/openj9/runtime/gc_modron_standard/RootScannerReadBarrierVerifier.cpp: In member function 'virtual void MM_RootScannerReadBarrierVerifier::scanClass(MM_EnvironmentBase*)':
18:05:38  /home/jenkins/workspace/Build_JDK17_x86-64_linux_jit_Personal/openj9/runtime/gc_modron_standard/RootScannerReadBarrierVerifier.cpp:54:23: error: unused variable 'omrVMThread' [-Werror=unused-variable]
18:05:38     54 |         OMR_VMThread *omrVMThread = env->getOmrVMThread();
18:05:38        |                       ^~~~~~~~~~~
18:05:38  cc1plus: all warnings being treated as errors

@cjjdespres could you please rebase, maybe these errors have been fixed.
Indeed I saw Peter reverting the offending PR 12 hours ago.

If a JITServer client uses -XX:+JITServerAOTCacheIgnoreLocalSCC,
servers that client connects to will now

1. Fail a compilation completely if the client requests an AOT cache
   store and the server cannot set up the compilation as an AOT cache
   store or load, and

2. Report the state of the server's AOT cache if (1) occurs, so the
   client will not continue to send AOT cache requests to that server
   if those requests could not possibly be fulfilled.

The client tracks the state of the current server's AOT cache with
the persistent info properties doNotRequestJITServerAOTCacheStore() and
doNotRequestJITServerAOTCacheLoad(). Since a particular server's cache
will never become able to complete AOT cache loads or stores partway
through a connection (e.g., the server's cache never shrinks in size,
so AOT cache stores will never succeed once it is full) these properties
remain true once set for the duration of the connection with a
particular server. They are reset whenever a client connects to a new
server.

Signed-off-by: Christian Despres <despresc@ibm.com>
The new _hasConnectedToServer flag in the persistent info is true at a
JITServer client when it has previously connected to a server,
regardless of the current status of any server connection. Tracking
this property is necessary to ensure that the deserializer is reset
on new connections, as the previous method (checking if the serverUID is
0) is not reliable for that purpose.

This fixes a bug that prevented the deserializer from being reset when
a client connected to a new server.

Signed-off-by: Christian Despres <despresc@ibm.com>
Signed-off-by: Christian Despres <despresc@ibm.com>
@cjjdespres
Copy link
Contributor Author

Rebased.

@mpirvu
Copy link
Contributor

mpirvu commented Mar 15, 2024

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk17

@mpirvu mpirvu merged commit eda370c into eclipse-openj9:master Mar 18, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants