Close TSMReaders from FileStore.Close after releasing FileStore mutex #9866
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
More work to properly fix #9786
Confirmed that
DROP SHARD X
works properly when the service is loaded with 10 concurrent select sum(f) style queries. Files are deleted, and file descriptors are closed. So #9792 is WAI.Looking at RP more closely:
SHOW SHARDS
Discovered that a RP event and exactly one query were deadlocked. RP was blocking here:
Hung query discovered here (duration reached as high as 1h, never returned):
The deadlock happens because
This change closes FileStore (clears the object members so that it looks closed) under write lock then releases the write lock before closing the underlying files.
To test this change:
select sum(field) from m
in 5 concurrent infinite loopsBefore this change, the test fails within 1-3 RP cycles, even deadlocking 3 of 5 queries in one case. After the change, the test has not failed after 200 RP cycles (about 2.5 hours).