Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchableSnapshotDirectory should not evict cache files when closed #66173

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Dec 10, 2020

This pull request changes the SearchableSnapshotDirectory so that it does not evict all its cache files at closing time, but instead delegates this work to the CacheService.

This change is motivated by:

  • the fact that Lucene directories are closed as the consequence of applying a new cluster state and as such the closing is executed within the cluster state applier thread; and we want to minimize disk IO operations in such thread (like deleting a lot of evicted cache files)
  • the future of the searchable snapshot cache which should become persistent

This change is built on top of the existing SearchableSnapshotIndexEventListener and a new SearchableSnapshotIndexFoldersDeletionListener (see #65926) that are used to detect when a searchable snapshot index (or searchable snapshot shard) is removed from a data node.

When such a thing happens, the listeners notify the CacheService that maintains an internal list of removed shards. This list is used to evict the cache files associated to these shards as soon as possible (but not in the cluster state applier thread) or right before the same searchable snapshot shard is being built again on the same node.

In other situations like opening/closing a searchable snapshot shard then the cache files are not evicted anymore and should be reused.

@tlrx tlrx added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v7.11.0 v8.0.0 labels Dec 10, 2020
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Dec 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial read and thought I would post my initial comments early.

*/
public void forEach(BiConsumer<K, V> consumer) {
for (CacheSegment<K, V> segment : segments) {
try (ReleasableLock ignored = segment.writeLock.acquire()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would holding the readLock not be enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is enough, I'm not sure why I used and documented the usage of the write lock here. I pushed 082e0c8 to use the read lock instead (as it should prevent any mutation anyway)

success = true;
} finally {
if (success == false) {
final boolean added = evictedShards.add(shardEviction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is necessary. When cache.invalidate fails, it already removed the object from the map, so if we retry an eviction, it will not hit the shard anyway.

So if we can remove this, I think we can also drop the shardsEvictionLock, which makes markShardAsEvictedInCache safer wrt. being called from the cluster applier thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, I pushed 393deaf to remove the usage of the lock here (but such locks are still necessary to avoid cache files eviction while starting the directory). Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the locks could be removed with some restructure, but we can tackle that in a follow-up, it is not really important.

However, if we add to evictedShards here while running the job on the thread, I think we risk it leaking since nothing will remove it later? I think we should assert success here. AFAICS, there is no way this should fail unless there is a bug somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be happy to know more about how you think it could be restructured. For now I'm taking this point and will come back to you in a short future for tackling this.

I agree there's a risk of leaking a ShardEviction around but I think it is very unlikely to happen. One idea would be to hook into the cache sync task to clean up any left overs ShardEviction there. If that's OK we can address this too as a follow up with the previous point. For now I added the success assertion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #67160 which should improve the situation.

}
});
if (cacheFilesToEvict.isEmpty() == false) {
cacheFilesToEvict.forEach(cache::invalidate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we need to catch exception here and assert that they are io-related, write a warning and then continue with the next file? That ensures we remove all cache files from the cache in one go, but may linger some files if there are io-issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your suggestion makes eviction safer - just in case. I pushed 58b7bd9

@tlrx
Copy link
Member Author

tlrx commented Dec 10, 2020

Thanks a lot @henningandersen. I've updated the PR according to your feedback. This is ready for another review.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we need to add a test, at least for the new methods on CacheService.

cacheFilesToEvict.put(cacheKey, cacheFile);
}
});
if (cacheFilesToEvict.isEmpty() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this check seems superfluous?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I removed it

for (Map.Entry<CacheKey, CacheFile> cacheFile : cacheFilesToEvict.entrySet()) {
try {
cache.invalidate(cacheFile.getKey(), cacheFile.getValue());
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just catch RuntimeException instead, since invalidate declares to not throw checked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

try {
cache.invalidate(cacheFile.getKey(), cacheFile.getValue());
} catch (Exception e) {
assert e instanceof IOException : e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking closer at this, I suppose this should never happen since we catch IO exceptions in onCacheFileRemoval, so we could just assert false : e here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.


@Override
public void onFailure(Exception e) {
logger.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could also assert false : e here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

success = true;
} finally {
if (success == false) {
final boolean added = evictedShards.add(shardEviction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the locks could be removed with some restructure, but we can tackle that in a follow-up, it is not really important.

However, if we add to evictedShards here while running the job on the thread, I think we risk it leaking since nothing will remove it later? I think we should assert success here. AFAICS, there is no way this should fail unless there is a bug somewhere.

@tlrx tlrx merged commit 7e6b52a into elastic:master Dec 11, 2020
@tlrx
Copy link
Member Author

tlrx commented Dec 11, 2020

Thanks a lot Henning! I merged this with an additional test which isn't great but can be improved in a follow up along with your suggestion about improving locking.

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Dec 14, 2020
…lastic#66173)

This commit changes the SearchableSnapshotDirectory so
that it does not evict all its cache files at closing time, but
instead delegates this work to the CacheService.

This change is motivated by the fact that Lucene directories
are closed as the consequence of applying a new cluster
state and as such the closing is executed within the cluster
state applier thread; and we want to minimize disk IO
operations in such thread (like deleting a lot of evicted
cache files). It is also motivated by the future of the
searchable snapshot cache which should become persistent.

This change is built on top of the existing
SearchableSnapshotIndexEventListener and a new
 SearchableSnapshotIndexFoldersDeletionListener
(see elastic#65926) that are used to detect when a searchable
snapshot index (or searchable snapshot shard) is
removed from a data node.

When such a thing happens, the listeners notify the
 CacheService that maintains an internal list of removed
shards. This list is used to evict the cache files associated
to these shards as soon as possible (but not in the cluster
state applier thread) or right before the same searchable
snapshot shard is being built again on the same node.

In other situations like opening/closing a searchable
snapshot shard then the cache files are not evicted
anymore and should be reused.
tlrx added a commit that referenced this pull request Dec 14, 2020
…66264)

This commit changes the SearchableSnapshotDirectory so
that it does not evict all its cache files at closing time, but
instead delegates this work to the CacheService.

This change is motivated by the fact that Lucene directories
are closed as the consequence of applying a new cluster
state and as such the closing is executed within the cluster
state applier thread; and we want to minimize disk IO
operations in such thread (like deleting a lot of evicted
cache files). It is also motivated by the future of the
searchable snapshot cache which should become persistent.

This change is built on top of the existing
SearchableSnapshotIndexEventListener and a new
 SearchableSnapshotIndexFoldersDeletionListener
(see #65926) that are used to detect when a searchable
snapshot index (or searchable snapshot shard) is
removed from a data node.

When such a thing happens, the listeners notify the
 CacheService that maintains an internal list of removed
shards. This list is used to evict the cache files associated
to these shards as soon as possible (but not in the cluster
state applier thread) or right before the same searchable
snapshot shard is being built again on the same node.

In other situations like opening/closing a searchable
snapshot shard then the cache files are not evicted
anymore and should be reused.

Backport of #66173 for 7.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants