Update Shard Indexing Pressure Settings Documentation #261

getsaurabh02 · 2021-11-11T04:28:59Z

Meta Issues on the Problem and Approach : opensearch-project/OpenSearch#478

High Level Control Levers: These are feature level settings

shard_indexing_pressure.enable : To turn the shard indexing backpressure feature on or off. Flipping this value on the fly is seamless and does not affect the traffic workload. All granular level metric tracking part of shard-indexing-pressure will turn on and off based on this flag. Default value is false and will be turned out on as part of the incremental rollout in future releases.
shard_indexing_pressure.enforced : To run the shard indexing backpressure feature in ShadowMode or EnforcedMode once the feature is enabled. Flipping this value on the fly is seamless. In Shadow Mode (values set as “false”) all granular level metric tracking part of the feature will continue to work, however no actual rejections will happen. In EnforcedMode (values set as “true“) requests breaching the key performance thresholds will be rejected as well.

Node Level quota/limits: These settings control memory utilization on a node

shard_indexing_pressure.primary_parameter.node.soft_limit : Today Indexing Pressure permits maximum of 10% of JVM memory for indexing traffic on a node (as part of indexing_pressure.memory.limit setting) . Here, this new setting allows to define the percentage of this Node level memory utilization to act as a soft indicator for duress on the node. By default 70% of the node level limit is used as a Soft Threshold indicator of duress for Shard Indexing Pressure, and it is only then additional granular tracking metrics are brought into consideration to find any actual degradation in the write path.

Shard Level quota/limits : These settings control memory utilization on a shard

shard_indexing_pressure.primary_parameter.shard.min_limit : Minimum assigned quota for a new shard at any role (coordinator, primary and replica), when a write comes in for new shard. This allocated quota is then increased or decreased based upon the inflow of traffic targeted of the shard.
shard_indexing_pressure.operating_factor.lower : Lower occupancy limit within the allocated quota for the shard, below which the allocated quota for the shard is considered for a decrease. This is indicative of a decrease in the traffic for this shard. Default value is 75%, which implies if total utilisation of shard goes below 75% of its allocated quota, shard will be considered for a decrease in it current assigned quota.
shard_indexing_pressure.operating_factor.optimal : Desired occupancy of the shard level quota at any given point in time. Shard operating at its optimal range is not considered for increase or decrease it its quota. Current default value is 85%.
shard_indexing_pressure.operating_factor.upper : Upper occupancy limit within the allocated quota for the shard, beyond which the allocated quota for the shard is considered for an increase, as there is increase in the traffic for this shard. Default value is 95%, which implies if total utilisation of shard goes below 95% of its allocated quota shard will be considered for an increase in its current assigned quota.

Performance Degradation Factors/Levers: These settings control the dynamic performance thresholds for a shard

shard_indexing_pressure.secondary_parameter.throughput.request_size_window : This is the sampling window size for requests one a shard, where the recent requests performance thresholds are evaluated and measured against the current in-flight requests to determine any degradation. The default value is of 2000 request and is chosen considering small bootstrap time while also providing sufficient data samples to perform right action.
shard_indexing_pressure.secondary_parameter.throughput.degradation_factor : This is the degradation factor of request, for per unit byte of request. This is to determine the threshold for latency spikes. Default value is 5x, which implies if latency shoots up 5 times the historical view, it is considered for degradation.
shard_indexing_pressure.secondary_parameter.successful_request.elapsed_timeout : This is used to identify any black hole or stuck request scenarios, where new requests are continuously accepted but there is no sufficient completion. Default value is kept as 300000 ms, to identify stuck requests in the system.
shard_indexing_pressure.secondary_parameter.successful_request.max_outstanding_requests : This is to take into account, the total number of requests which should to be stuck (with elapsed_timout setting above). Combination of outstanding requests and timeout allows the system to flag it as a threshold breach. Default value for this is 100 outstanding requests.

The text was updated successfully, but these errors were encountered:

ashwinkumar12345 · 2021-12-10T18:50:11Z

Added - #327.

getsaurabh02 added the enhancement New feature or request label Nov 11, 2021

ashwinkumar12345 self-assigned this Nov 19, 2021

ashwinkumar12345 closed this as completed Dec 20, 2021

Provide feedback