-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete by query causing fielddata cache spike leading to 429 #2550
Comments
Might be worth to try an add an intercepting proxy to the setup to capture the exact request that is sent out by the delete by query request. Spring Data Elasticsearch 4.2 is outdated and out of maintenance for over one year now. The last version of the 4.x releases (4.4.x) has reached EOL last week. When looking at the code in the 5.0 branch that uses the then already deprecated Can you reproduce this in a setup using the maintained versions (5.1 or 5.0), they both still allow the old client to be used, or better, can you switch to a supported version and use the current Elasticsearch client? |
yeah ... if I would try the maintained versions, but if the I will ping back if I find something more. Thxs! |
This ticket is the result of two weeks of experiments.
I'll try to put all the information because It might be something wrong with RestHighLevelClient doing deleteByQuery.
I have been two weeks betting that it should be a problem on my side or a problem on Elastic (performance, configuration) but after several experiments I need to present this to you because I have no explanation.
First of all, I have prior knowledge of Elastic and I am aware that updates and deletes are expensive operations, this is not about that.
CONTEXT
We are using RestHighLevelClient configured like this:
As said we do many ingest and query operations ... as example:
We do also updateByQuery operations ... like this:
update script looks like this:
finally we do deleteByQuery operations with the same query as update operations.
Of course no script in that case.
ISSUE
All operations run like a charm except deleteByQuery. At the moment deleteByQuery is enabled (even these being just a fraction of the traffic, even when there are much more UPDATE operations) the cluster starts to get into problems. ALL delete operations timeout, although the records are removed from the cluster. The fielddata cache starts to grow significantly, eventually causing the GC usage and duration to spike, eventually causing the CPU to spike, and finally causing the circuit breaker [parent] to be triggered starting to respond 429 TOO MANY REQUEST to our operations.
This is no matter of the size of the result of the delete query, delete queries bringing just 1 o 2 documents cause the same effect.
Please remember that the amount of deleted queries is small.
This only happens on deletes. If I replace deletes with updates (using the same query and a script that updates four fields) the cluster is stable. This alone is very weird to me since updates are expected to be more expensive than updates.
NOTE If I bypass spring-data-elasticsearch and use feign client sending POST HTTP requests directly without the RestHighLevelClient for the delete operations, then the cluster is stable. This leads me to think that there might be something wrong with the deletes that RestHighLevelClient is sending. It feels like something is not closed (connection timeout).
Here are some screenshots:
Timeout exception on ALL delete operations
Metrics when deletes are enabled
(we disable updates at the same time so 100% of the spikes are related to deletes)
The text was updated successfully, but these errors were encountered: