-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Searchable Snapshot: Search hangs when parallel searches to same remote index #6295
Comments
Update: Seems to happen consistently when multiple searches are performed in parallel against the same index or segments. ES 7 also had an issue: elastic/elasticsearch#85239 |
Thanks @dgilling! Does this mean you can reproduce this without the nested aggregation? |
@andrross , yes it's reproduced regardless of nested aggregation. It's possible the nested aggregation just helped trigger the scenario with longer search time. Usually happens when at least 12 or 15 searches hit the same remote index. Artificially staggering the searches with enough time (like a few seconds of delay between each search) helps reduce the scenario. I also see it regardless of using Azure Blob Storage or AWS s3 as the backing remote store. I did look into the ES 7 issue and applied a hotfix, doesn't look to be hitting that exception condition or see any other exceptions with TRACE logs. |
Updated title to reflect issue |
@andrross Should also note that attempts to cancel the tasks or set a timeout doesn't seem to clean them up. Only a restart of the nodes with |
I was able to reproduce this by creating the
I took a thread dump from the resulting deadlock and it does indeed look like the issue detailed in #6437. I applied the fix from PR #6467 and was not able to reproduce the failure. |
@kartg Let's keep this open until we get full verification of the fix |
Thanks @andrross for the quick fix. The good news:
The bad news:
example 1:
Example 2:
|
@andrross @kartg for some more context, did some more testing today with official 2.6.0 image released yesterday.
|
@andrross @kartg Did some more testing given discussion on potentially too many file descriptors from the remote cache. After a second pass at crash report, we did see over 200k opened to remote store related files.
Since not too familiar with the remote cache code, let me know if those are not the right things to look at. While it seemed to help prevent the crash, it uncovered some new exceptions:
|
Thanks @dgilling! We're clearly missing some limits for cases like yours. You did make the correct changes (if you're able to test with a 2.7 snapshot build, the configuration setting like We had a similar issue with the clone bug you just pointed out but it looks like it is not completely fixed. I'll dig into that now. |
@dgilling It would also be interesting to see what kind of pressure the file cache is under during your testing, if at all possible. You can get snapshots of the file cache stats by doing
|
@andrross sure, let me know when you're able to push a fix for |
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com> (cherry picked from commit d139ebc) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue #6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. (cherry picked from commit d139ebc) Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@kotwanikunal @andrross , looks like that did the trick. No more file descriptor explosion. Much appreciated. Will monitor and report on stats as we get more data |
…t#6592) The previous implementation had an inherent race condition where a zero-reference count IndexInput read from the cache could be evicted before the IndexInput was cloned (and therefore had its reference count incremented). Since the IndexInputs are stateful this is very bad. The least-recently-used semantics meant that in a properly-configured system this would be unlikely since accessing a zero-reference count item would move it to be most-recently used and therefore least likely to be evicted. However, there was still a latent bug that was possible to encounter (see issue opensearch-project#6295). The only way to fix this, as far as I can see, is to change the cache behavior so that fetching an item from the cache atomically increments its reference count. This also led to a change to TransferManager to ensure that all requests for an item ultimately read through the cache to eliminate any possibility of a race. I have implement some concurrent unit tests that put the cache into a worst-case thrashing scenario to ensure that concurrent access never closes an IndexInput while it is still being used. Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Mingshi Liu <mingshl@amazon.com>
Describe the bug
A clear and concise description of what the bug is.
When performing a aggregation on a nested field on a searchable snapshot (where
storage_type
:remote_snapshot
), the search task hangs for days if no timeout is defined (which is default behavior). This can block the node from handling future searches once search thread queue filled up.To Reproduce
Steps to reproduce the behavior:
remote_snapshot
Expected behavior
A clear and concise description of what you expected to happen.
The search request should complete or have a reasonable default timeout to not deadlock future searches on the node.
The same query on local index (not remote_snapshot) takes <100ms. We expect the queries to take longer, but not 2 days and keep retrying.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Dump of stuck tasks
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
Using Azure Blob Storage as the snapshot repo.
The text was updated successfully, but these errors were encountered: