-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make searchable snapshots cache persistent #65725
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
@original-brownbear @henningandersen this is ready for a first round of reviews. I suspect that a situation is not well handled as one of our main test keeps failing, which I'm investigating, but we can start grabing feedback on the main change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial read of the production code, looks good.
final NodeEnvironment.NodePath nodePath = writer.nodePath(); | ||
logger.debug("loading persistent cache on data path [{}]", nodePath); | ||
|
||
for (String indexUUID : nodeEnvironment.availableIndexFoldersForPath(writer.nodePath())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
for (String indexUUID : nodeEnvironment.availableIndexFoldersForPath(writer.nodePath())) { | |
for (String indexUUID : nodeEnvironment.availableIndexFoldersForPath(nodePath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I pushed 9bef60d
final Map<String, Document> documents = new HashMap<>(); | ||
try (IndexReader indexReader = DirectoryReader.open(directory)) { | ||
for (LeafReaderContext leafReaderContext : indexReader.leaves()) { | ||
final LeafReader leafReader = leafReaderContext.reader(); | ||
final Bits liveDocs = leafReader.getLiveDocs(); | ||
for (int i = 0; i < leafReader.maxDoc(); i++) { | ||
if (liveDocs == null || liveDocs.get(i)) { | ||
final Document document = leafReader.document(i); | ||
documents.put(getValue(document, CACHE_ID_FIELD), document); | ||
} | ||
} | ||
} | ||
} catch (IndexNotFoundException e) { | ||
logger.debug("persistent cache index does not exist yet", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to extract this to a separate method and pass the map directly to the CacheFileVisitor
? It feels a bit backwards to pass it through the CacheIndexWriter
, but there might be a valid reason for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No valid reason, I agree that seems a bit backward. I pushed 5dcd53a
document.add(new StringField(INDEX_NAME_FIELD, indexId.getName(), Field.Store.YES)); | ||
document.add(new StringField(INDEX_ID_FIELD, indexId.getId(), Field.Store.YES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these different from the shard index name/id below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those ones are referring to the index in the snapshot while shard's index name/id are referring to the index assigned to the data node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one point question here:
Should we simplify CacheKey
maybe before we move to persisting the format? like we're going to here? There is no need to store the indexId
IMO. The following information is enough to uniquely identify a shard in a given snapshot:
- snapshot_uuid
- index name
- shard id (
int
)
All the other information like index uuid and and IndexId
are just noise. Having redundant index_uuid
in here also technically allows for having the same file in the cache twice for different mounts of an index? (not that it makes sense to have those but it complicates the logic for figuring out what is already in the cache for a given index in the allocation work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can indeed simplify the CacheKey and removes snapshot's name and snapshot's index uuid. We need to keep the shard's IndexId because as of today cache files are located within a specified shard data path, so they belong to a specific shard within the cluster and need to follow this shard lifecycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because as of today cache files are located within a specified shard data path, so they belong to a specific shard within the cluster and need to follow this shard lifecycle
Makes sense thanks :) shall we drop that here or is BwC trivial if we chose to simplify this later (it looks trivial actually and we can just ignore the extra fields in the documents in a follow-up ... but maybe I'm missing something)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome :) I think that the change will be trivial as you said. Actually I think I can do it as a follow up but because this is spread in many classes doing it now will introduce a lot of noise (and potential merge conflicts).
@henningandersen @original-brownbear I merged the latest changes from master in this pull request, most notably :
I also removed the previously muted test that should succeed now thanks to all of these changes. I plan to add few more IT but I can take this as follow ups because this pull request is already large. This is ready for reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial comments (the one on the format of the Lucene documents/keys is why I'm snap commenting here :) since I had that question working on the allocation as well)
...apshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/PersistentCache.java
Outdated
Show resolved
Hide resolved
|
||
if (Files.isDirectory(shardCachePath)) { | ||
logger.trace("found snapshot cache dir at [{}], loading cache files from disk and index", shardCachePath); | ||
Files.walkFileTree(shardCachePath, new CacheFileVisitor(cacheService, writer, documents)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: This visitor may be easier to read if it was just inlined and one doesn't have to jump around to the other class to figure out what's going on (also saves a few lines for the constructor)?
logger.trace("found snapshot cache dir at [{}], loading cache files from disk and index", shardCachePath);
Files.walkFileTree(shardCachePath, new SimpleFileVisitor<>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
try {
final String id = buildId(file);
final Document cacheDocument = documents.get(id);
if (cacheDocument != null) {
logger.trace("indexing cache file with id [{}] in persistent cache index", id);
writer.updateCacheFile(id, cacheDocument);
final CacheKey cacheKey = buildCacheKey(cacheDocument);
logger.trace("adding cache file with [id={}, cache key={}]", id, cacheKey);
final long fileLength = getFileLength(cacheDocument);
cacheService.put(cacheKey, fileLength, file.getParent(), id, buildCacheFileRanges(cacheDocument));
} else {
logger.trace("deleting cache file [{}] (does not exist in persistent cache index)", file);
Files.delete(file);
}
} catch (Exception e) {
throw ExceptionsHelper.convertToRuntime(e);
}
return FileVisitResult.CONTINUE;
}
});
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I pushed ac6050f
ensureOpen(); | ||
try { | ||
for (CacheIndexWriter writer : writers) { | ||
writer.prepareCommit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need two loops for preparing and then committing? We do that for the CS persistence so we can throw IOError
and stop everything if a commit fails but for the cache service it seems like we could just commit
in a loop and ahve that deal with prepareCommit
? That would also make exceptions less noisy because if you prepare a commit on one writer and then fail committing another writer and thus call close close
the first writer will throw on closing because it has a pending commit as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, thanks for catching this. Commiting in a single loop makes it less error prone as you pointed. I pushed ac6050f
document.add(new StringField(INDEX_NAME_FIELD, indexId.getName(), Field.Store.YES)); | ||
document.add(new StringField(INDEX_ID_FIELD, indexId.getId(), Field.Store.YES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one point question here:
Should we simplify CacheKey
maybe before we move to persisting the format? like we're going to here? There is no need to store the indexId
IMO. The following information is enough to uniquely identify a shard in a given snapshot:
- snapshot_uuid
- index name
- shard id (
int
)
All the other information like index uuid and and IndexId
are just noise. Having redundant index_uuid
in here also technically allows for having the same file in the cache twice for different mounts of an index? (not that it makes sense to have those but it complicates the logic for figuring out what is already in the cache for a given index in the allocation work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (outside of the points raised by Henning. Read through it fully now and couldn't find anything to add so all good from my end, thanks!
Thanks @henningandersen and @original-brownbear ! I've applied some changes after your feedback, let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks a lot Henning and Armin! |
The searchable snapshots cache implemented in 7.10 is not persisted across node restarts, forcing data nodes to download files from the snapshot repository again once the node is restarted. This commit introduces a new Lucene index that is used to store information about cache files. The information about cache files are periodically updated and committed in this index as part of the cache synchronization task added in elastic#64696. When the data node starts the Lucene index is used to load in memory the cache files information; these information are then used to repopulate the searchable snapshots cache with the cache files that exist on disk. Since data nodes can have one or more data paths, this change introduces a Lucene index per data path. Information about cache files are updated in the Lucene index located on the same data path of the cache files.
The searchable snapshots cache implemented in 7.10 is not persisted across node restarts, forcing data nodes to download files from the snapshot repository again once the node is restarted. This commit introduces a new Lucene index that is used to store information about cache files. The information about cache files are periodically updated and committed in this index as part of the cache synchronization task added in #64696. When the data node starts the Lucene index is used to load in memory the cache files information; these information are then used to repopulate the searchable snapshots cache with the cache files that exist on disk. Since data nodes can have one or more data paths, this change introduces a Lucene index per data path. Information about cache files are updated in the Lucene index located on the same data path of the cache files. Backport of #65725 for 7.11
This commit simplifies the CacheKey object as suggested in #65725.
This commit simplifies the CacheKey object as suggested in elastic#65725.
The searchable snapshots cache implemented in 7.10 is not persisted across node restarts, forcing data nodes to download files from the snapshot repository again once the node is restarted.
This pull request introduces a new Lucene index that is used to store information about cache files. The information about cache files are periodically updated and committed in this index as part of the cache synchronization task added in #64696. When the data node starts the Lucene index is used to load in memory the cache files information; these information are then used to repopulate the searchable snapshots cache with the cache files that exist on disk.
Since data nodes can have one or more data paths, this change introduces a Lucene index per data path. Information about cache files are updated in the Lucene index located on the same data path of the cache files.
This pull request is lacking some tests and maybe documentation. There are also a test failure that I'm still tracking, but I'd like to start reviews so that we can move forward.