-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Shard Indexing backpressure mechanism should also protect from any CPU contention on nodes #7638
Comments
Tagging relevant folks to get their thoughts on above issue and proposed solution |
For indexing the payload bytes serve as a good mechanism to estimate the amount of work. What you are referring to is a mechanism to do admission control. Backpressure is a way to tell the higher layers to not accept more work if the last leg is slogging to catch up. To improve on backpressure we need a mechanism to get the CPU cost of the indexing request and apply backpressure. Note backpressure isn't about rejection point in time. |
Yes, that's exactly the intent here, if a downstream node ( eg. replica is downstream for primary node in bulk request lifecycle) is lagging due to cpu contention then upstream node can reject additional work till the downstream node is not under duress.
Does taking backpressure decisions just on the cpu cost of indexing request is justified? Because other tasks such as search or background tasks such as merging may've been taking a lot of cpu time and in those cases as well we should backpressure the upstream node to not take too much indexing load which will allow downstream nodes to recover. Even if downstream node accepts the request then it'll take a long time to complete indexing requests due to cpu contention and the overall bulk requests will lead to 504s from client perspective. I believe that that cpu cost of indexing request can be helpful as a secondary parameter to do shard level rejections. Once we are sure the downstream node is under duress due to overall cpu which will be a primary parameter, then we may look at aggregated cpu cost per shard to reject indexing requests for those shard. For calculating overall cpu time we may choose to device a similar approach to resource tracking framework introduced as part of search backpressure although it isn't straightforward to do so since multiple tasks are involved in indexing request lifecycle.
Backpressure is still point in time decision by maintaining a view of last few time units. To me distinction between backpressure and admission control is less about time but more about where, how and on which info they act. Backpressure on a particular node happens by keeping aggregated view of all downstream nodes and rejecting request on the basis of downstream node as opposed to admission control which has local knowledge on the node and rejections are based on self metrics i.e. self rejections (similar to thread pool and circuit breaker rejections). In addition to this admission control doesn't act on transport layer, backpressure protects on both http and transport
Do we have any GH issue/rfc for this? |
Is your feature request related to a problem? Please describe.
Currently shard Indexing pressure rejection strategy is determined by the node and shard level limit of memory occupancy along with performance parameter like throughput limits on shard level. Thus, current indexing backpressure only accounts for memory bottlenecks and may not protect a system under duress due to cpu contention from high indexing workload. This can lead to node drops and cluster unavailability over time since there may not be enough cpu resource to fulfil essential cluster tasks like responding to health checks etc. This demands for cpu resource utilisation to be one of the trigger point of indexing backpressure
Describe the solution you'd like
Currently, Shard indexing pressure has one hard limit parameter
Node Heap Memory Occupancy Hard Limit - Whenever the current memory occupancy at node level becomes equal to 100% of assigned node memory this parameter is assumed to be breached.
Node heap memory hard limit protects from memory exhaustion on the node, similarly we should add a hard limit for cpu utilization in order to protect the node from sustained high cpu usage
Node CPU Utlization Hard Limit - Whenever the currently tracked cpu utilisation at node level becomes more than 95% this parameter is assumed to be breached
Currently indexing pressure has two primary parameters both of which are based on memory occupancy of node and shard respectively
To account for cpu contention on node we will be adding a third primary parameter
The final Rejection Rules will become
For CPU usage tracking we can have a component CPUResourceWatcher which will be a scheduled task and will be responsible to take snapshots at regular intervals (5 secs). It’ll update the average CPU % utilization in a configurable sliding window. Length of sliding window determines how quickly will indexing pressure reacts to the changes in cpu utilization. We can provide a dynamic setting UtilizationResponsiveness which can be set to Low, Medium or High. Each utilization responsiveness will have a default sliding window length for eg. 10, 15 and 30s for high, medium and low respectively. Exact values for sliding window length will be decided after performance benchmarking. Lower responsiveness i.e. longer sliding window will respond slowly to changes in utilization. A sustained increase in utilization over a sufficiently long period of time will trigger indexing pressure. This assumes that frequent spikes in utilization are benign and do not affect overall health of node. Whereas if we desire a fast response to any sudden increase in utilization then HIgh responsiveness will be better.
I am looking for feedback from the community to evolve this feature from an idea to concrete proposal.
The text was updated successfully, but these errors were encountered: