Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make searchable snapshots cache persistent #65725

Merged
merged 27 commits into from
Dec 14, 2020
Merged

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Dec 2, 2020

The searchable snapshots cache implemented in 7.10 is not persisted across node restarts, forcing data nodes to download files from the snapshot repository again once the node is restarted.

This pull request introduces a new Lucene index that is used to store information about cache files. The information about cache files are periodically updated and committed in this index as part of the cache synchronization task added in #64696. When the data node starts the Lucene index is used to load in memory the cache files information; these information are then used to repopulate the searchable snapshots cache with the cache files that exist on disk.

Since data nodes can have one or more data paths, this change introduces a Lucene index per data path. Information about cache files are updated in the Lucene index located on the same data path of the cache files.

This pull request is lacking some tests and maybe documentation. There are also a test failure that I'm still tracking, but I'd like to start reviews so that we can move forward.

@tlrx tlrx added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v7.11.0 v8.0.0 labels Dec 2, 2020
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Dec 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx
Copy link
Member Author

tlrx commented Dec 2, 2020

@original-brownbear @henningandersen this is ready for a first round of reviews. I suspect that a situation is not well handled as one of our main test keeps failing, which I'm investigating, but we can start grabing feedback on the main change.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial read of the production code, looks good.

final NodeEnvironment.NodePath nodePath = writer.nodePath();
logger.debug("loading persistent cache on data path [{}]", nodePath);

for (String indexUUID : nodeEnvironment.availableIndexFoldersForPath(writer.nodePath())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
for (String indexUUID : nodeEnvironment.availableIndexFoldersForPath(writer.nodePath())) {
for (String indexUUID : nodeEnvironment.availableIndexFoldersForPath(nodePath)) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I pushed 9bef60d

Comment on lines 273 to 287
final Map<String, Document> documents = new HashMap<>();
try (IndexReader indexReader = DirectoryReader.open(directory)) {
for (LeafReaderContext leafReaderContext : indexReader.leaves()) {
final LeafReader leafReader = leafReaderContext.reader();
final Bits liveDocs = leafReader.getLiveDocs();
for (int i = 0; i < leafReader.maxDoc(); i++) {
if (liveDocs == null || liveDocs.get(i)) {
final Document document = leafReader.document(i);
documents.put(getValue(document, CACHE_ID_FIELD), document);
}
}
}
} catch (IndexNotFoundException e) {
logger.debug("persistent cache index does not exist yet", e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to extract this to a separate method and pass the map directly to the CacheFileVisitor? It feels a bit backwards to pass it through the CacheIndexWriter, but there might be a valid reason for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No valid reason, I agree that seems a bit backward. I pushed 5dcd53a

Comment on lines +542 to +543
document.add(new StringField(INDEX_NAME_FIELD, indexId.getName(), Field.Store.YES));
document.add(new StringField(INDEX_ID_FIELD, indexId.getId(), Field.Store.YES));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these different from the shard index name/id below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those ones are referring to the index in the snapshot while shard's index name/id are referring to the index assigned to the data node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one point question here:

Should we simplify CacheKey maybe before we move to persisting the format? like we're going to here? There is no need to store the indexId IMO. The following information is enough to uniquely identify a shard in a given snapshot:

  • snapshot_uuid
  • index name
  • shard id (int)

All the other information like index uuid and and IndexId are just noise. Having redundant index_uuid in here also technically allows for having the same file in the cache twice for different mounts of an index? (not that it makes sense to have those but it complicates the logic for figuring out what is already in the cache for a given index in the allocation work)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can indeed simplify the CacheKey and removes snapshot's name and snapshot's index uuid. We need to keep the shard's IndexId because as of today cache files are located within a specified shard data path, so they belong to a specific shard within the cluster and need to follow this shard lifecycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because as of today cache files are located within a specified shard data path, so they belong to a specific shard within the cluster and need to follow this shard lifecycle

Makes sense thanks :) shall we drop that here or is BwC trivial if we chose to simplify this later (it looks trivial actually and we can just ignore the extra fields in the documents in a follow-up ... but maybe I'm missing something)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome :) I think that the change will be trivial as you said. Actually I think I can do it as a follow up but because this is spread in many classes doing it now will introduce a lot of noise (and potential merge conflicts).

@tlrx
Copy link
Member Author

tlrx commented Dec 11, 2020

@henningandersen @original-brownbear I merged the latest changes from master in this pull request, most notably :

I also removed the previously muted test that should succeed now thanks to all of these changes. I plan to add few more IT but I can take this as follow ups because this pull request is already large.

This is ready for reviews.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments (the one on the format of the Lucene documents/keys is why I'm snap commenting here :) since I had that question working on the allocation as well)


if (Files.isDirectory(shardCachePath)) {
logger.trace("found snapshot cache dir at [{}], loading cache files from disk and index", shardCachePath);
Files.walkFileTree(shardCachePath, new CacheFileVisitor(cacheService, writer, documents));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: This visitor may be easier to read if it was just inlined and one doesn't have to jump around to the other class to figure out what's going on (also saves a few lines for the constructor)?

                            logger.trace("found snapshot cache dir at [{}], loading cache files from disk and index", shardCachePath);
                            Files.walkFileTree(shardCachePath, new SimpleFileVisitor<>() {
                                @Override
                                public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
                                    try {
                                        final String id = buildId(file);
                                        final Document cacheDocument = documents.get(id);
                                        if (cacheDocument != null) {
                                            logger.trace("indexing cache file with id [{}] in persistent cache index", id);
                                            writer.updateCacheFile(id, cacheDocument);

                                            final CacheKey cacheKey = buildCacheKey(cacheDocument);
                                            logger.trace("adding cache file with [id={}, cache key={}]", id, cacheKey);
                                            final long fileLength = getFileLength(cacheDocument);
                                            cacheService.put(cacheKey, fileLength, file.getParent(), id, buildCacheFileRanges(cacheDocument));
                                        } else {
                                            logger.trace("deleting cache file [{}] (does not exist in persistent cache index)", file);
                                            Files.delete(file);
                                        }
                                    } catch (Exception e) {
                                        throw ExceptionsHelper.convertToRuntime(e);
                                    }
                                    return FileVisitResult.CONTINUE;
                                }
                            });
                        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I pushed ac6050f

ensureOpen();
try {
for (CacheIndexWriter writer : writers) {
writer.prepareCommit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need two loops for preparing and then committing? We do that for the CS persistence so we can throw IOError and stop everything if a commit fails but for the cache service it seems like we could just commit in a loop and ahve that deal with prepareCommit? That would also make exceptions less noisy because if you prepare a commit on one writer and then fail committing another writer and thus call close close the first writer will throw on closing because it has a pending commit as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thanks for catching this. Commiting in a single loop makes it less error prone as you pointed. I pushed ac6050f

Comment on lines +542 to +543
document.add(new StringField(INDEX_NAME_FIELD, indexId.getName(), Field.Store.YES));
document.add(new StringField(INDEX_ID_FIELD, indexId.getId(), Field.Store.YES));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one point question here:

Should we simplify CacheKey maybe before we move to persisting the format? like we're going to here? There is no need to store the indexId IMO. The following information is enough to uniquely identify a shard in a given snapshot:

  • snapshot_uuid
  • index name
  • shard id (int)

All the other information like index uuid and and IndexId are just noise. Having redundant index_uuid in here also technically allows for having the same file in the cache twice for different mounts of an index? (not that it makes sense to have those but it complicates the logic for figuring out what is already in the cache for a given index in the allocation work)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (outside of the points raised by Henning. Read through it fully now and couldn't find anything to add so all good from my end, thanks!

@tlrx
Copy link
Member Author

tlrx commented Dec 14, 2020

Thanks @henningandersen and @original-brownbear ! I've applied some changes after your feedback, let me know what you think.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tlrx tlrx merged commit 672972c into elastic:master Dec 14, 2020
@tlrx tlrx deleted the add-persistent-cache branch December 14, 2020 14:25
@tlrx
Copy link
Member Author

tlrx commented Dec 14, 2020

Thanks a lot Henning and Armin!

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Dec 14, 2020
The searchable snapshots cache implemented in 7.10 is
not persisted across node restarts, forcing data nodes to
download files from the snapshot repository again once
the node is restarted.

This commit introduces a new Lucene index that is used
to store information about cache files. The information
about cache files are periodically updated and committed
in this index as part of the cache synchronization task
added in elastic#64696. When the data node starts the Lucene
index is used to load in memory the cache files information;
these information are then used to repopulate the searchable
snapshots cache with the cache files that exist on disk.

Since data nodes can have one or more data paths, this
change introduces a Lucene index per data path. Information
about cache files are updated in the Lucene index located
on the same data path of the cache files.
tlrx added a commit that referenced this pull request Dec 14, 2020
The searchable snapshots cache implemented in 7.10 is
not persisted across node restarts, forcing data nodes to
download files from the snapshot repository again once
the node is restarted.

This commit introduces a new Lucene index that is used
to store information about cache files. The information
about cache files are periodically updated and committed
in this index as part of the cache synchronization task
added in #64696. When the data node starts the Lucene
index is used to load in memory the cache files information;
these information are then used to repopulate the searchable
snapshots cache with the cache files that exist on disk.

Since data nodes can have one or more data paths, this
change introduces a Lucene index per data path. Information
about cache files are updated in the Lucene index located
on the same data path of the cache files.

Backport of #65725 for 7.11
tlrx added a commit that referenced this pull request Dec 14, 2020
This commit simplifies the CacheKey object 
as suggested in #65725.
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Dec 14, 2020
This commit simplifies the CacheKey object 
as suggested in elastic#65725.
tlrx added a commit that referenced this pull request Dec 14, 2020
This commit simplifies the CacheKey object 
as suggested in #65725.

Backport of #66263 for 7.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement release highlight Team:Distributed Meta label for distributed team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants