Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements and clean up related to the JITServer AOT cache #18990

Open
6 of 13 tasks
cjjdespres opened this issue Feb 21, 2024 · 15 comments
Open
6 of 13 tasks

Improvements and clean up related to the JITServer AOT cache #18990

cjjdespres opened this issue Feb 21, 2024 · 15 comments
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project

Comments

@cjjdespres
Copy link
Contributor

cjjdespres commented Feb 21, 2024

There are still various cleanup/refactoring tasks remaining after the merge of #18301. People may have some input on the solutions to these things; in any case, I'll put the list up here just to record these things for myself.

  • Using the JITServer AOT cache now sets the portable AOT flag, which Alexey says (Allow JITServer AOT cache stores and loads without a local SCC #18301 (comment)) will cause a degradation in performance. This is avoidable, as I explained in Allow JITServer AOT cache stores and loads without a local SCC #18301 (comment). Though, I sort of remember that the -XX:+PortableSharedCache flag is set in the liberty container. Something to look at.
  • The method JITServerAOTDeserializer::reset() is missing a critical section for _classLoaderMonitor.
  • It may not be wise for copyJ9UTF8 in ClassLoaderTable.cpp to throw an exception (Support persistent class loader name tracking in all cases when JITServer AOT cache is active #18839 (comment)).
  • Currently the AOT cache can generate class records for the [L class, which isn't really a class name at all - it's the romClass->className of an array class, which is incorrect. These classes can't be stored in the SCC - there happens to be a workaround for this issue in SVM, but otherwise if the old AOT cache implementation ever tried to rememberClass() a class with an array class in its chain somewhere it would simply fail to store the chain in the SCC and the AOT compilation would fail. Fixing this could be as easy as recording the correct class name in the ClassSerializationRecord (I haven't tried this) but we also speculated in one of our weekly meetings that a new ArrayClassSerializationRecord could also be created, which would record the arity of the array class and the element class name and hash. (Now that I think about it, we can't simply record the correct name in ClassSerializationRecord - we would have to incorporate the hash of the element ROM class in some way as well). A short-term fix would be to refuse to create class serialization records for array classes, which should match the not-ignoring-SCC behaviour (something I still need to verify). I should say that the current behaviour isn't broken, as these records will always fail to be deserialized - it would just be better to handle them at compile time.
  • Lambda class names (which have $$Lambda$<index>/<address> at the end) are less normalized without an SCC present. With an SCC, the <address> part is just NULL (specifically, a 0x0... address), and I think the <index> is also calculated differently (but I'm less sure about that). Performing the same transformation with an SCC not present (even if it's just at the point where a ClassSerializationRecord is created) would improve matching, as currently the "Failed to find class in class loader" deserialization error comes up more frequently in the new implementation if the original ClassSerializationRecord was created for a client that did not have a local SCC present.
  • In the TR_J9DeserializerSharedCache we do TR::Compilation *comp = TR::compInfoPT->getCompilation(); a few times. We might be able to cache a pointer to the compInfoPT in that class like we now do in TR_J9JITServerSharedCache.
  • We can disable the writable SCC layer creation feature that I added when we're running with the new AOT cache implementation, as it's completely redundant to have that layer created when we're ignoring the local SCC.
  • I think there is an assert in ClientSessionData::getClassRecord(ClientSessionData::ClassInfo &classInfo, bool &missingLoaderInfo) that's firing in the new implementation. It indicates that there is a class chain offset being cached without a corresponding loader name. It's not fatal, really. Just odd that it's happening. I still have to track down the cause.
  • Class loader and class re-caching can still be implemented in the new deserializer, if we save the name of the class loader and the name and hash of the class when we cache the records initially.
  • Now that there is a AOTDeserializerReset exception, the explicit wasReset parameter in the deserializer methods can be replaced by throwing and catching that exception. Though, the entire deserializer reset mechanism can also be replaced by a readers-writer lock instead.
  • In JITServerCompilationThread.cpp there is a line _aotCacheStore = aotCacheStore && aotCache && JITServerAOTCacheMap::cacheHasSpace(); that I think is not exactly what we want when ignoring the local SCC. Before, we could downgrade an AOT cache store at the server at any time, turning it into a normal AOT compilation. When we are ignoring the local SCC we can't let this happen, as we don't have access to a local SCC as a source of offsets. Instead of having that test, we might have to abort the compilation and inform the client that the server doesn't support AOT cache stores.
  • It would be good to improve AOT cache testing. Currently we only have the one basic testServerAOTCache component in testJITServer. There is the already-existing issue JITServer AOT test #16428 that would be nice to have in - it could be run under various options as well. Likely there are other tests of various AOT cache features that could be added.
  • If the server performs an AOT cache store compilation for a method that is already in the AOT cache, the server will send the cached compiled method and not the freshly-compiled method (see Allow JITServer AOT cache stores and loads without a local SCC #18301 (comment)). This may be an issue if the client requested a fresh AOT store compilation because of a previous failure to load that particular cached method.
@cjjdespres
Copy link
Contributor Author

Attn @mpirvu, @AlexeyKhrabrov.

@AlexeyKhrabrov
Copy link
Contributor

AlexeyKhrabrov commented Feb 21, 2024

Currently the AOT cache can generate class records for the [L class, which isn't really a class name at all - it's the romClass->className of an array class, which is incorrect. These classes can't be stored in the SCC - there happens to be a workaround for this issue in SVM, but otherwise if the old AOT cache implementation ever tried to rememberClass() a class with an array class in its chain somewhere it would simply fail to store the chain in the SCC and the AOT compilation would fail. Fixing this could be as easy as recording the correct class name in the ClassSerializationRecord (I haven't tried this) but we also speculated in one of our weekly meetings that a new ArrayClassSerializationRecord could also be created, which would record the arity of the array class and the element class name.

I already have a working implementation of handling object array classes. I needed it for caching profiling data at JITServer, but I don't remember this being an issue for AOT cache. Storing the full array class name in the ClassSerializationRecord and extracting the element class name and the arity from the full name during deserialization works fine, there is no need for a separate ArrayClassSerializationRecord. But the ROMClass hash stored in the class record must be a combination of the hashes for the L[ and array element ROMClasses, and the arity. I'll contribute the implementation eventually.

@AlexeyKhrabrov
Copy link
Contributor

Lambda class names (which have $$Lambda$<index>/<address> at the end) are less normalized without an SCC present. With an SCC, the <address> part is just NULL, and I think the <index> is also calculated differently (but I'm less sure about that). Performing the same transformation with an SCC not present (even if it's just at the point where a ClassSerializationRecord is created) would improve matching, as currently the "Failed to find class in class loader" deserialization error comes up more frequently in the new implementation if the original ClassSerializationRecord was created for a client that did not have a local SCC present.

This is more complicated than that, and doesn't have much to do with the local SCC (except for the <address> part being NULL or not). As far as I understand, lambda classes currently cannot be looked up by name at all using any existing APIs, they are anonymous by design. Any AOT method (for both local SCC and AOT cache) that refers to any lambda classes in relocation/validation records cannot be loaded in the current implementation.

The <index> part can vary from run to run because it depends on the order in which lambdas classes are loaded, which is generally non-deterministic. So we have to keep track of all lambda classes defined by the same host class, and disambiguate them based on the ROMClass hash instead of the index, and exclude non-deterministic parts of class name strings when computing those hashes. The same applies to other types of classes defined at runtime such as proxy classes and generated reflection accessors. Anyway, I also have a working implementation that handles all that, I'll contribute it at some point.

By the way, the same issue applies to hidden classes because they are ignored in class lookup by name. But this is a much easier fix, we simply need to add a flag to jitGetClassInClassloaderFromUTF8() and propagate it down to wherever the check for a hidden class happens.

@AlexeyKhrabrov
Copy link
Contributor

I think there is an assert in ClientSessionData::getClassRecord(ClientSessionData::ClassInfo &classInfo, bool &missingLoaderInfo) that's firing in the new implementation. It indicates that there is a class chain offset being cached without a corresponding loader name. It's not fatal, really. Just odd that it's happening. I still have to track down the cause.

Which assertion?

@cjjdespres
Copy link
Contributor Author

This assert:

         TR_ASSERT(TR_SharedCache::INVALID_CLASS_CHAIN_OFFSET != classInfo._classChainOffsetIdentifyingLoader,
                   "Valid class chain offset but missing class name identifying loader");

@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Feb 23, 2024
@AlexeyKhrabrov
Copy link
Contributor

Class loader and class re-caching can still be implemented in the new deserializer, if we save the name of the class loader and the name and hash of the class when we cache the records initially.

This should be easier to do with the infrastructure I have for AOT prefetching where I store not-yet-valid records in the deserializer cache until corresponding classes get loaded. When a class gets unloaded, we can simply recreate and store its serialization record.

Another approach would be to invalidate known IDs for unloaded classes and loaders on the server side, which would be preferable in terms of client memory usage. But we would have to figure out how to synchronize things properly, maybe using the existing mechanism with critical compilation requests.

@cjjdespres
Copy link
Contributor Author

@AlexeyKhrabrov We were thinking of trying to get the array class fix (for the [L issue I mentioned above) in before a certain milestone release later this month. Do you happen to have that particular change readily available? It's fine if not - the solution is straightforward, so it shouldn't be much duplication of effort to add it myself.

@cjjdespres
Copy link
Contributor Author

Out of curiosity - did you end up modifying the JITServerSharedROMClassCache so it could handle array ROM classes?

@AlexeyKhrabrov
Copy link
Contributor

@AlexeyKhrabrov We were thinking of trying to get the array class fix (for the [L issue I mentioned above) in before a certain milestone release later this month. Do you happen to have that particular change readily available? It's fine if not - the solution is straightforward, so it shouldn't be much duplication of effort to add it myself.

I can put together a draft PR some time later this week, but might not have time to work on ironing out the details afterwards. When exactly is the milestone "later this month"?

Out of curiosity - did you end up modifying the JITServerSharedROMClassCache so it could handle array ROM classes?

No, I didn't need to.

@cjjdespres
Copy link
Contributor Author

cjjdespres commented Mar 12, 2024

It's milestone 2 for the 0.44 release. I asked again and apparently I was misremembering; it's on March 18, and it would only be later if some delay were to occur. (Edit: Looking back, I see that I did just say "later this month" and not "end of the month").

@AlexeyKhrabrov
Copy link
Contributor

I've looked at my code, unfortunately I didn't add handling object arrays as a separate commit, so it'll take me a couple hours to extract the changes (which are scattered across a few components: hash class, AOT cache, client session, deserializer). I can submit an untested draft PR tomorrow. But in general I wouldn't call the changes completely straightforward, e.g. one of the complications is requesting uncached base component classes from the client. So finalizing this before the 18th might be optimistic (unless my implementation is overcomplicated and this can be done easier).

@cjjdespres
Copy link
Contributor Author

I think that's all right. Another option that @mpirvu and I discussed was simply forbidding class record generation for array ROM classes, which would at least have the same behaviour (effectively) as what happens with -XX:-JITServerAOTCacheIgnoreLocalSCC at the moment. I should be able to do that quickly, and that guard could be removed when your changes are incorporated.

@cjjdespres
Copy link
Contributor Author

So I can just do that, and support for array classes can wait until your larger changes are incorporated.

@cjjdespres
Copy link
Contributor Author

Regarding this comment, I don't believe that scenario can happen, since we should be setting the flag entry->_doNotLoadFromJITServerAOTCache if any AOT cache store or load compilation fails. For example, here after a relocation failure:

entry->_doNotLoadFromJITServerAOTCache = true;

Then, in these places:

&& !entry->_doNotLoadFromJITServerAOTCache

if (persistentInfo->getJITServerAOTCacheIgnoreLocalSCC())
{
aotCacheStore = entry->_useAOTCacheCompilation &&
compInfo->methodCanBeJITServerAOTCacheStored(compiler->signature(), compilee->convertToMethod()->methodType());
aotCacheLoad = persistentInfo->getJITServerUseAOTCache() &&
!persistentInfo->doNotRequestJITServerAOTCacheLoad() &&
!TR::CompilationInfo::isCompiled(method) &&
!entry->_doNotLoadFromJITServerAOTCache &&
compInfo->methodCanBeJITServerAOTCacheLoaded(compiler->signature(), compilee->convertToMethod()->methodType());
}
else
{
aotCacheStore = useAotCompilation && persistentInfo->getJITServerUseAOTCache() &&
compInfo->methodCanBeJITServerAOTCacheStored(compiler->signature(), compilee->convertToMethod()->methodType());
aotCacheLoad = aotCacheStore && !entry->_doNotLoadFromJITServerAOTCache &&
compInfo->methodCanBeJITServerAOTCacheLoaded(compiler->signature(), compilee->convertToMethod()->methodType());
}

that flag will forbid AOT store or load requests respectively.

The field is perhaps poorly named - its analogue in local AOT is _doNotUseAotCodeFromSharedCache I believe.


Now that I think about it, a method being eligible for an AOT cache store does not imply it is eligible for an AOT cache load in general, only because the client might have explicitly forbidden that method from being an AOT cache load with a command line option, but not forbidden it from being an AOT cache store. (I think the ability to do that was introduced for testing purposes). Maybe for that reason alone the fresh code should be sent every time.

@AlexeyKhrabrov
Copy link
Contributor

I see, the part that I missed was the fact that _doNotLoadFromJITServerAOTCache effectively forbids both loads and stores in such cases. Yeah some of these flags could use better naming.

Ignoring the edge cases with specific methods forbidden from stores and/or loads, turning a store into a load in JITServerAOTCache::storeMethod() should only affect a scenario that is essentially a race between multiple threads compiling the same method after not finding it in the AOT cache at the start of the compilation, in which case it doesn't really matter which version is sent to the client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
Status: To do
Development

No branches or pull requests

3 participants