-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add persistence to the JITServer AOT cache #15848
Comments
Attn @mpirvu |
In a cluster there could be several JITServer instances, each with its own in-memory AOT cache. Ideally, there would be a single AOT cache that is shared by all the server instances. Unfortunately, synchronization would be a very expensive proposition, especially considering that the servers could be running on different nodes. You would need something like a distributed shared memory mechanism that is difficult to build and maintain, not to say that overhead could be prohibitive. The second best idea is to just implement AOT cache persistence and a snapshot-restore mechanism. The snapshot operation needs to serialize the AOT cache and write it to a file. The restore operation just re-instantiates an AOT file from a snapshot.
Do we allow merging/updates for a given AOT cache snapshot? Who initiates the AOT cache snaphot? |
I think we should start with the simple case of immutable snapshot files. Later we can consider whether it's worth it to implement a merge operation. It's easy to merge (append) new data into an existing snapshot that hasn't been modified since the current JITServer instance has loaded it. But this is equivalent to simply overwriting the snapshot with the new one, and the merge operation is only an optimization, which may or may not make much of a difference to overall perfromance. Merging data created concurrently by separate JITServer instances is more challenging since it requires re-assigning record IDs and choosing which compiled methods to keep if there are multiple versions. I think all these things are doable, but will probably need quite a bit of effort. It should be possible to atomically overwrite an existing snapshot file in place by writing the new snapshot into a separate uniquely named file and then renaming it. Rename operations are supposed by be atomic on local file systems in Linux, hopefully that applies to container volumes. I think we need to support taking a snapshot while the JITServer is running (periodically or for external requests), without suspending compilation threads. This will require some careful synchronization, but hopefully nothing too complicated. |
Here is an outline of how storing and loading a snapshot can be implemented (since I've already sketched out the design in the past but never implemented it). Store:
Load:
One of the remaining questions is when to load the snapshots. Some of the options are:
|
At the moment individual fresh AOT cache instances are created and held separately for each client. How would that interact with the persistence mechanism? |
Depends on when/how we load the cache snapshots. If eagerly at JITServer start, then each pre-loaded AOT cache instance will already be created and loaded by the time the first client requests it, so there is no interaction. If we load them on demand (which I think makes more sense), when an AOT cache with a given name is first requested by a client, the JITServer will check if a snapshot with this name exists and will initiate the load. To avoid delaying compilation requests while the snapshot is being loaded, we need to serve compilation requests without the cache until the load is complete. One more thing to consider is whether we want to support replacing an already active AOT cache instance at runtime with a "better" (e.g. bigger) snapshot, in response to an operator request. That can be done relatively easily for future clients that connect after the snapshot is loaded. This is mostly equivalent to directing new clients to a new JITServer instance that uses the new snapshot, but more efficient. Replacing an AOT cache instance for live clients would be more challenging since that requires clearing some caches on both the server and the client. |
A JITServer can have several AOT caches, each with its own name. It's not good idea to load all those caches on start-up, because the server may never use many of them. It's much better to load a cache only when a client asks for it. As Alexey mentioned, replacing the in-memory cache of a server with a better one introduces some complexity which is to be avoided in the first version of this tech. |
When new records are added to a map, all their dependencies should already exist in the cache, right? So we could just keep that list around and append new records to it as they are created and that list will remain sorted by that partial order. |
That would work, but I think it might be better to keep a separate list for each record type. That way the snapshot file will be more structured (divided into sections containing records of one type) enabling more integrity checks at load time. It might also make the loads a bit faster because of better locality (populating one map at a time instead of all at once). |
This also made me think of a simpler way to synchronize taking a snapshot with concurrent additions of new records without marking the newly added records. We can simply remember how many records of each type exist at the start of the snapshot, and as we iterate each list, stop after that number of records (any records after that were added after the start of the snapshot). |
Is the plan to add a linked list for each hashtable we have today, and to insert a pointer to a record both in hashtable and the linked list? |
Yes. New records will be added to the tail of the list, and the snapshot writer will traverse it from head to tail. |
I believe so |
I am ok with it, but we need to quantify the memory increase this change brings. |
8 bytes per record, which amounts to ~220 KB for AcmeAir with a total cache memory usage of ~50 MB, or less than 0.5%. |
We need to add the overhead of the list infrastructure (two extra pointers per node), so ~660 KB. It's manageable though. |
One |
ok. I was thinking at using std::list to store pointers to records, rather than modifying the records themselves. |
For the snapshot header I propose we include:
|
The records can depend on other records of different types. Since the record types will be kept in their own sections, can the record types be ordered so that the resulting snapshot file will still be sorted? |
Yes, see #15848 (comment). |
Sorry, missed that. Thanks. |
Some thoughts about JITServer instances periodically overwriting an AOT snapshot. Saving a snapshot should be attempted when enough AOT compilations have been added to the in-memory cache since the last snapshot attempt. What constitutes "enough" is debatable. It could be anywhere between 100-500 methods. I am also thinking of imposing a time restriction; we shouldn't attempt to write snapshots every second even if enough AOT methods have been added. Saving the snapshot: search the snapshot directory for a file named AOTSnapshot..bin (or something like that). If file exists, open the file and read the header. From the header we can read the number of AOT methods and decide whether we want to overwrite ( we should do it only if our snapshot has "significantly" more methods). How do we avoid two JITServer instances writing their own snapshot at about same time. The danger is that the last one to save the snapshot may write one with fewer AOT methods. Maybe the servers should remember the timestamp of the existing snapshot, write their own snapshot in a temporary file and just before doing the rename, check again the timestamp. If it changed, we need to read the header again and if we have fewer entries, abort the snapshot operation by deleting the temporary file. |
Once we have the snapshot mechanism implementation we'll be able to measure the overhead (in particular wall-clock and CPU time to write one) which will help guide some of the policy decisions (e.g. how often to take snapshots).
It's even a bit more complicated than that. Currently an AOT cache instance can have multiple AOT headers and separate sets of compiled methods for each AOT header. The motivation for that was to avoid duplicating all the metadata (class records etc.) when multiple clients run the same application on diverse hardware or a diverse set of JVM settings (e.g. heap size ranges). Overwriting a snapshot potentially means losing the cache for multiple AOT header versions. This might actually be a worthwhile use case for merging snapshots. We need to think about this more.
As far as I understand, the main use case for sharing AOT cache snapshots is efficiently auto scaling a JITServer deployment by launching new instances with a warm AOT cache and supporting scaling down to zero without losing the AOT cache. I think we can reasonably expect all JITServer instances in a single deployment to run the same version (in most cases). Maybe this can even be enforced at the JITServer operator level. Or yes, we can keep multiple versions around with different file names, and eventually purge stale ones. Also for long running JITServer deployments we might want to think about purging stale (not reused in a long time) methods (and their metadata dependencies) from the cache to avoid growing the snapshot sizes. One simple way is to skip such entries when writing a snapshot. |
After giving it a bit more thought, merging snapshots doesn't seem too complicated, at least algorithmically. Here is a high level sketch of the algorithm to merge two snapshots. It can be easily generalized to N snapshots if that ever becomes useful. For simplicity let's assume that both caches C1 and C2 are in memory, but not concurrently used or modified, and we're merging a "smaller" new cache C2 into a "larger" base cache C1. Iterate through all records in C2 in dependency order: all class loader records, then all class records, etc. For each record R, perform a lookup in the corresponding key-record map in C1. If the lookup is successful, then R is a duplicate of an existing record in C1, otherwise it's a new unique record. In case of a duplicate, replace the ID of the record with the C1 version. In case of a new record, assign it a new ID (using a counter starting at to the number of records of this type in C1). Then for each sub-record that R refers to, read its ID (which is guaranteed to already be updated) and replace the corresponding ID field in R with it. In practice though, only one of the caches is in memory, and the other one is in an existing snapshot file. The in-memory cache also needs to stay active, i.e. support concurrent lookup requests and modifications. As a result, we actually have to merge the existing snapshot file into the in-memory cache (regardless of which one is larger), and write the result into a new file. During a "merging" snapshot, after writing out the records of a given type stored in the in-memory cache (as in regular snapshot operation), we then read through the records of the same type in the existing snapshot file, and look them up in the key-record map of the in-memory cache, skipping duplicates and writing the new unique records (with reassigned IDs) into the output file. To be able to update the sub-record IDs, we need to maintain the maps from the "old" IDs (used in the existing snapshot file) to the "new" IDs for each record type. These maps are temporary and only needed during the merge operation. Since the in-memory cache can gain new records concurrently, we need to remember the numbers of records (i.e. maximum IDs) of each type at the start of the snapshot/merge, and treat record key matches that map to in-memory records created after the start of the snapshot (ones with IDs higher than the remembered maximum) as new unique records rather than duplicates. Also each lookup needs to be done while holding the corresponding lock that protects the map. All of the above only applied to metadata records, and the main remaining question is what to do with the actual serialized methods in case of conflicts (duplicate keys). One simple policy could be to keep the more recent version, either always the in-memory one (which is more likely to be fresh), or based on the compilation timestamp that we can store in each serialized method. |
Thanks for the idea @AlexeyKhrabrov. It's something we should definitely pursue after we get the simpler implementation ready. |
Regarding loading the snapshot in the background while other compilations that might want to use it are progressing we have two options:
I am leaning more to the second approach. I looked at how this can be implemented and sketched some high level design where the compilation thread that calls |
Agreed. Since loading a snapshot can be I/O bound (if the snapshot file is not in the OS page cache), we don't want the single loader thread become a bottleneck if the JITServer needs to load multiple caches at the same time.
Just to clarify, do you mean multiple requests for the same cache name from multiple compilation threads? To handle that, we can store the requests in a |
Yes, that's why I said that we need to keep some state around. |
Currently, if the client does not ask for an AOT cache by name, a nameless AOT cache will be created at the server. |
Agreed, but I think simply Since we include cache names into their snapshot file names, we should reject names that contain any characters invalid in file names ( |
Unlike the main AOT cache, the current JITServer AOT cache lacks any persistence; only an in-memory cache is supported. An option should be added to allow for JITServer AOT cache persistence.
The text was updated successfully, but these errors were encountered: