-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to update expiration time for async-search #63213
Comments
Pinging @elastic/es-search (:Search/Search) |
This error means that the search was cancelled or deleted so the document doesn't exist anymore. Are you seeing any error in Kibana when this is logged in ES ? In any case, this logging is misleading since we shouldn't log anything (at least not at the error level) if the document is "just" missing. I'll leave the bug label to fix the logging level. |
Thanks for looking into this @jimczi ! |
I have the same problem since 7.9.1 its really annoying to see error every time i click search in kibana. I see error message in kibana frontend and in elasticsearch logs. Elastic logs: |
Can you share the error that you see in Kibana and the stack trace that is logged in ES ? Does it happen on every search ? It could be a bug in Kibana too so we'll need to understand how this error is triggered. |
JVM version:
The error is happening on any index - filebeat, winlogbeat etc.
Elasticsearch logs:
|
Ya we hit this one a lot across instances. Seems to make queries hang while it resolves. (7.9.2 / ECK). I spent more time with this. It would be nice to control in some way what nodes this index goes on. My issue seems to be directly related to a primary being on a heavily utilized hot node, deleting it and forcing the system to recreate it on another set of nodes (I had to do this a few times), showed immediate and lasting benefit. Guessing somewhat related to #37867 |
Here is Kibana stack trace:
|
The _async_search APIs can throw version conflict exception when the internal response is updated concurrently. That can happen if the final response is written while the user extends the expiration time. That scenario should be rare but it happened in Kibana for several users so this change ensures that updates are retried at least 5 times. That should resolve the transient errors for Kibana. This change also preserves the version conflict exception in case the retry didn't work instead of returning a confusing 404. This commit also ensures that we don't delete the response if the search was cancelled internally and not deleted explicitly by the user. Closes elastic#63213
I opened #63652 after reading the additional stack traces that you provided. The one in the description is about a deleted search so it's not really a bug but the subsequent ones are due to a race condition. The update of the expiration time competes with the update of the response when the search finishes so a version conflict can happen. Retrying the updates in #63652 in should be enough to fix this bug. |
Thank you @jimczi 👍 |
* Async search should retry updates on version conflict The _async_search APIs can throw version conflict exception when the internal response is updated concurrently. That can happen if the final response is written while the user extends the expiration time. That scenario should be rare but it happened in Kibana for several users so this change ensures that updates are retried at least 5 times. That should resolve the transient errors for Kibana. This change also preserves the version conflict exception in case the retry didn't work instead of returning a confusing 404. This commit also ensures that we don't delete the response if the search was cancelled internally and not deleted explicitly by the user. Closes #63213
* Async search should retry updates on version conflict The _async_search APIs can throw version conflict exception when the internal response is updated concurrently. That can happen if the final response is written while the user extends the expiration time. That scenario should be rare but it happened in Kibana for several users so this change ensures that updates are retried at least 5 times. That should resolve the transient errors for Kibana. This change also preserves the version conflict exception in case the retry didn't work instead of returning a confusing 404. This commit also ensures that we don't delete the response if the search was cancelled internally and not deleted explicitly by the user. Closes #63213
* Async search should retry updates on version conflict The _async_search APIs can throw version conflict exception when the internal response is updated concurrently. That can happen if the final response is written while the user extends the expiration time. That scenario should be rare but it happened in Kibana for several users so this change ensures that updates are retried at least 5 times. That should resolve the transient errors for Kibana. This change also preserves the version conflict exception in case the retry didn't work instead of returning a confusing 404. This commit also ensures that we don't delete the response if the search was cancelled internally and not deleted explicitly by the user. Closes #63213
* Async search should retry updates on version conflict The _async_search APIs can throw version conflict exception when the internal response is updated concurrently. That can happen if the final response is written while the user extends the expiration time. That scenario should be rare but it happened in Kibana for several users so this change ensures that updates are retried at least 5 times. That should resolve the transient errors for Kibana. This change also preserves the version conflict exception in case the retry didn't work instead of returning a confusing 404. This commit also ensures that we don't delete the response if the search was cancelled internally and not deleted explicitly by the user. Closes #63213
Elasticsearch version (
bin/elasticsearch --version
):7.9.1 (Elastic Cloud)
Plugins installed: []
N/A
JVM version (
java -version
):OS version (
uname -a
if on a Unix-like system):Description of the problem including expected versus actual behavior:
In the ES Cloud console logs, I see the following ERROR log one or two times a day:
It's hard to know where that would come from (maybe from interactions in Kibana), but I wouldn't expect to see such errors. It seems to be due to a version conflict when updating a document in
.async_search
The text was updated successfully, but these errors were encountered: