Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve shards evictions in searchable snapshot cache service #67519

Merged
merged 1 commit into from
Jan 14, 2021

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jan 14, 2021

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently #66958, #66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.

Backport of #67160 for 7.11.1

…c#67160)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently elastic#66958, elastic#66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
@tlrx tlrx added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport v7.11.1 labels Jan 14, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Jan 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx tlrx merged commit 1ee5d91 into elastic:7.11 Jan 14, 2021
@tlrx tlrx deleted the improve-shards-evictions-7.11 branch January 14, 2021 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team v7.11.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants