Fix race with eviction when reading from FileCache #6592

andrross · 2023-03-09T01:39:34Z

The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295).

The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used.

Issues Resolved

Closes #6536

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2023-03-09T02:09:43Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/12149/
CommitID: c958df2

codecov-commenter · 2023-03-09T02:10:54Z

Codecov Report

Merging #6592 (a177172) into main (bdb4f7a) will increase coverage by 0.03%.
The diff coverage is 69.38%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6592      +/-   ##
============================================
+ Coverage     70.77%   70.80%   +0.03%     
+ Complexity    59156    59136      -20     
============================================
  Files          4804     4803       -1     
  Lines        283102   283118      +16     
  Branches      40813    40811       -2     
============================================
+ Hits         200361   200463     +102     
+ Misses        66305    66186     -119     
- Partials      16436    16469      +33

Impacted Files	Coverage Δ
...pensearch/common/logging/OpenSearchJsonLayout.java	`0.00% <0.00%> (ø)`
...in/java/org/opensearch/index/shard/IndexShard.java	`69.73% <0.00%> (-0.52%)`	⬇️
...s/replication/SegmentReplicationTargetService.java	`49.03% <0.00%> (-0.97%)`	⬇️
...x/store/remote/filecache/FileCachedIndexInput.java	`54.16% <33.33%> (-1.39%)`	⬇️
...g/opensearch/common/settings/WriteableSetting.java	`69.58% <50.00%> (-2.00%)`	⬇️
...arch/index/store/remote/utils/TransferManager.java	`78.12% <61.11%> (+3.12%)`	⬆️
...search/index/store/remote/filecache/FileCache.java	`82.85% <76.47%> (-8.45%)`	⬇️
...n/java/org/opensearch/common/settings/Setting.java	`86.32% <100.00%> (+0.19%)`	⬆️
.../main/java/org/opensearch/env/NodeEnvironment.java	`76.64% <100.00%> (+0.25%)`	⬆️
...e/remote/file/OnDemandBlockSnapshotIndexInput.java	`77.27% <100.00%> (+6.43%)`	⬆️
... and 2 more

... and 460 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

github-actions · 2023-03-09T23:09:07Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/12204/
CommitID: b57cbe7

reta · 2023-03-10T13:02:04Z

server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java

-
-            // refcount = 0 at the beginning
-            FileCachedIndexInput newOrigin = new FileCachedIndexInput(fileCache, blobFetchRequest.getFilePath(), downloaded);
+        if (origin == null) {


I am wondering why we don't use computeIfAbsent ?

IndexInput origin = fileCache.computeIfAbsent(key, (key) -> downloadBlockLocally(...));

The fileCache.computeIfPresent would not be needed in this case since origin should not be null (or should be re-downloaded again).

We do not keep the IndexInput open always. In case of cache restore, we insert a ClosedIndexInput instance which just has the length of the index input to keep track of capacity. We surely do need a mechanism to keep track of isClosed mechanism if the file hasn't been completely eliminated from local disk.

We surely do need a mechanism to keep track of isClosed mechanism if the file hasn't been completely eliminated from local disk.

Correct, those should be removed, right? And computeIfAbsent would work as expected

I think all of this can be replaced with a V compute(K key, BiFunction<K, V, V> remappingFunction) call. I can handle all three cases:

exists and is open, return as-is

exists and is closed, remap it an open IndexInput

does not exist, download and create a new one

Is your suggestion on the lines of -

// Do download/network fetches with this block IndexInput origin = fileCache.computeIfAbsent(key, (key) -> downloadBlockLocally(...)); // Once we have the block, either downloaded or on disk, do the following? if (cachedIndexInput.isClosed()) { }

We surely do need a mechanism to keep track of isClosed mechanism if the file hasn't been completely eliminated from local disk.

Correct, those should be removed, right? And computeIfAbsent would work as expected

@reta We still need to handle the closed IndexInput case. On startup, we populate the cache with placeholder IndexInput entries and defer creating the real IndexInput entry that actually opens a handle to the file on disk until it is actually needed to be used.

kotwanikunal · 2023-03-10T17:14:37Z

server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java

+                // Another thread is downloading the same resource. Wait for it
+                // to complete then make a recursive call to fetch it from the
+                // cache.
+                existingLatch.await();


I am wondering if we need to add a termination mechanism here. Edge case, but what if the downloads do not complete, we will be blocking quite a few threads with a bunch of requests coming in, given that we do not use a dedicated threadpool anymore.

server/src/main/java/org/opensearch/index/store/remote/utils/cache/LRUCache.java

github-actions · 2023-03-10T20:45:45Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12245/
CommitID: 1df86ac
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

reta · 2023-03-10T21:36:08Z

server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java

-        IndexInput origin = fileCache.computeIfPresent(blobFetchRequest.getFilePath(), (path, cachedIndexInput) -> {
-            if (cachedIndexInput.isClosed()) {
-                // if it's already in the file cache, but closed, open it and replace the original one
+        final IndexInput origin = fileCache.compute(key, (path, cachedIndexInput) -> {


Look much cleaner to me, thanks @andrross !

Thank you! This approach seems kind of obvious in hindsight. This is why code reviews are good :)

github-actions · 2023-03-10T21:48:23Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12253/
CommitID: d7245ab
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-03-10T21:55:53Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12252/
CommitID: 3d2c2e8
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-03-10T22:33:16Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12254/
CommitID: 1ad82a7
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

andrross · 2023-03-10T22:36:19Z

Another failure of #6531

github-actions · 2023-03-10T23:39:32Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12256/
CommitID: 1ad82a7
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

andrross · 2023-03-10T23:41:48Z

More SegmentReplicationRelocationIT failures...

github-actions · 2023-03-11T00:41:55Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12264/
CommitID: 1ad82a7
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>

github-actions · 2023-03-11T01:45:26Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12272/
CommitID: cc1e291
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-03-11T01:54:49Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/12274/
CommitID: a177172

kotwanikunal · 2023-03-11T03:44:44Z

Whitesource issue unrelated to the PR changes. Raised a PR for the fix - #6629

The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com> (cherry picked from commit d139ebc) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. (cherry picked from commit d139ebc) Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…t#6592) The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Mingshi Liu <mingshl@amazon.com>

andrross requested review from reta, anasalkouz, Bukhtawar, CEHENKLE, dblock, gbbafna, setiah, kartg, kotwanikunal, mch2, nknize, owaiskazi19, adnapibar, Rishikesh1159, ryanbogan, saratvemulapalli, shwetathareja, dreamer-89, tlfeng, VachaShah and xuezhou25 as code owners March 9, 2023 01:39

andrross added the skip-changelog label Mar 9, 2023

andrross force-pushed the concurrent-cache-bug branch from c958df2 to b57cbe7 Compare March 9, 2023 22:35

reta reviewed Mar 10, 2023

View reviewed changes

kotwanikunal reviewed Mar 10, 2023

View reviewed changes

andrross force-pushed the concurrent-cache-bug branch from b57cbe7 to 1df86ac Compare March 10, 2023 20:39

andrross force-pushed the concurrent-cache-bug branch from 1df86ac to 3d2c2e8 Compare March 10, 2023 21:29

reta reviewed Mar 10, 2023

View reviewed changes

andrross force-pushed the concurrent-cache-bug branch 2 times, most recently from d7245ab to 1ad82a7 Compare March 10, 2023 21:47

andrross force-pushed the concurrent-cache-bug branch from 1ad82a7 to cc1e291 Compare March 11, 2023 01:12

andrross force-pushed the concurrent-cache-bug branch from cc1e291 to a177172 Compare March 11, 2023 01:25

kotwanikunal approved these changes Mar 11, 2023

View reviewed changes

kotwanikunal merged commit d139ebc into opensearch-project:main Mar 11, 2023

kotwanikunal added the backport 2.x Backport to 2.x branch label Mar 11, 2023

opensearch-trigger-bot bot mentioned this pull request Mar 11, 2023

[Backport 2.x] Fix race with eviction when reading from FileCache #6630

Merged

kotwanikunal mentioned this pull request Mar 11, 2023

[BUG] Searchable Snapshot: Search hangs when parallel searches to same remote index #6295

Closed

andrross mentioned this pull request Apr 5, 2023

Use per-key latch to wait on file downloads #7015

Closed

6 tasks

andrross deleted the concurrent-cache-bug branch April 6, 2023 16:38

andrross mentioned this pull request Apr 11, 2023

[BUG][Searchable Snapshots] Restored files to the FileCache should have a refcount of zero #7097

Closed

andrross mentioned this pull request Jun 26, 2023

Remove minimum file cache size restriction #8259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race with eviction when reading from FileCache #6592

Fix race with eviction when reading from FileCache #6592

andrross commented Mar 9, 2023

github-actions bot commented Mar 9, 2023

codecov-commenter commented Mar 9, 2023 •

edited

Loading

github-actions bot commented Mar 9, 2023

reta Mar 10, 2023

kotwanikunal Mar 10, 2023

reta Mar 10, 2023

andrross Mar 10, 2023

kotwanikunal Mar 10, 2023

andrross Mar 10, 2023

kotwanikunal Mar 10, 2023

github-actions bot commented Mar 10, 2023

reta Mar 10, 2023

andrross Mar 10, 2023

github-actions bot commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

andrross commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

andrross commented Mar 10, 2023

github-actions bot commented Mar 11, 2023

github-actions bot commented Mar 11, 2023

github-actions bot commented Mar 11, 2023

kotwanikunal commented Mar 11, 2023

Fix race with eviction when reading from FileCache #6592

Fix race with eviction when reading from FileCache #6592

Conversation

andrross commented Mar 9, 2023

Issues Resolved

Check List

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

codecov-commenter commented Mar 9, 2023 • edited Loading

Codecov Report

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 10, 2023

Gradle Check (Jenkins) Run Completed with:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 10, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 10, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 10, 2023

Gradle Check (Jenkins) Run Completed with:

andrross commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

Gradle Check (Jenkins) Run Completed with:

andrross commented Mar 10, 2023

github-actions bot commented Mar 11, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 11, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 11, 2023

Gradle Check (Jenkins) Run Completed with:

kotwanikunal commented Mar 11, 2023

codecov-commenter commented Mar 9, 2023 •

edited

Loading