-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race with eviction when reading from FileCache #6592
Fix race with eviction when reading from FileCache #6592
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6592 +/- ##
============================================
+ Coverage 70.77% 70.80% +0.03%
+ Complexity 59156 59136 -20
============================================
Files 4804 4803 -1
Lines 283102 283118 +16
Branches 40813 40811 -2
============================================
+ Hits 200361 200463 +102
+ Misses 66305 66186 -119
- Partials 16436 16469 +33
... and 460 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
c958df2
to
b57cbe7
Compare
Gradle Check (Jenkins) Run Completed with:
|
|
||
// refcount = 0 at the beginning | ||
FileCachedIndexInput newOrigin = new FileCachedIndexInput(fileCache, blobFetchRequest.getFilePath(), downloaded); | ||
if (origin == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why we don't use computeIfAbsent
?
IndexInput origin = fileCache.computeIfAbsent(key, (key) -> downloadBlockLocally(...));
The fileCache.computeIfPresent
would not be needed in this case since origin
should not be null
(or should be re-downloaded again).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not keep the IndexInput
open always. In case of cache restore, we insert a ClosedIndexInput
instance which just has the length of the index input to keep track of capacity. We surely do need a mechanism to keep track of isClosed
mechanism if the file hasn't been completely eliminated from local disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We surely do need a mechanism to keep track of isClosed mechanism if the file hasn't been completely eliminated from local disk.
Correct, those should be removed, right? And computeIfAbsent
would work as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of this can be replaced with a V compute(K key, BiFunction<K, V, V> remappingFunction)
call. I can handle all three cases:
- exists and is open, return as-is
- exists and is closed, remap it an open IndexInput
- does not exist, download and create a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your suggestion on the lines of -
// Do download/network fetches with this block
IndexInput origin = fileCache.computeIfAbsent(key, (key) -> downloadBlockLocally(...));
// Once we have the block, either downloaded or on disk, do the following?
if (cachedIndexInput.isClosed()) {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We surely do need a mechanism to keep track of isClosed mechanism if the file hasn't been completely eliminated from local disk.
Correct, those should be removed, right? And
computeIfAbsent
would work as expected
@reta We still need to handle the closed IndexInput case. On startup, we populate the cache with placeholder IndexInput entries and defer creating the real IndexInput entry that actually opens a handle to the file on disk until it is actually needed to be used.
// Another thread is downloading the same resource. Wait for it | ||
// to complete then make a recursive call to fetch it from the | ||
// cache. | ||
existingLatch.await(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we need to add a termination mechanism here. Edge case, but what if the downloads do not complete, we will be blocking quite a few threads with a bunch of requests coming in, given that we do not use a dedicated threadpool anymore.
server/src/main/java/org/opensearch/index/store/remote/utils/cache/LRUCache.java
Show resolved
Hide resolved
b57cbe7
to
1df86ac
Compare
Gradle Check (Jenkins) Run Completed with:
|
1df86ac
to
3d2c2e8
Compare
IndexInput origin = fileCache.computeIfPresent(blobFetchRequest.getFilePath(), (path, cachedIndexInput) -> { | ||
if (cachedIndexInput.isClosed()) { | ||
// if it's already in the file cache, but closed, open it and replace the original one | ||
final IndexInput origin = fileCache.compute(key, (path, cachedIndexInput) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look much cleaner to me, thanks @andrross !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This approach seems kind of obvious in hindsight. This is why code reviews are good :)
d7245ab
to
1ad82a7
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Another failure of #6531 |
Gradle Check (Jenkins) Run Completed with:
|
More SegmentReplicationRelocationIT failures... |
Gradle Check (Jenkins) Run Completed with:
|
1ad82a7
to
cc1e291
Compare
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
cc1e291
to
a177172
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Whitesource issue unrelated to the PR changes. Raised a PR for the fix - #6629 |
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com> (cherry picked from commit d139ebc) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. (cherry picked from commit d139ebc) Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…t#6592) The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Mingshi Liu <mingshl@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295).
The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used.
Issues Resolved
Closes #6536
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.