You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dynamic Open Search Settings for Shard Indexing Backpressure feature
High Level Control Levers: These are feature level settings
shard_indexing_pressure.enable : To turn the shard indexing backpressure feature on or off. Flipping this value on the fly is seamless and does not affect the traffic workload. All granular level metric tracking part of shard-indexing-pressure will turn on and off based on this flag. Default value is false and will be turned out on as part of the incremental rollout in future releases.
shard_indexing_pressure.enforced : To run the shard indexing backpressure feature in ShadowMode or EnforcedMode once the feature is enabled. Flipping this value on the fly is seamless. In Shadow Mode (values set as “false”) all granular level metric tracking part of the feature will continue to work, however no actual rejections will happen. In EnforcedMode (values set as “true“) requests breaching the key performance thresholds will be rejected as well.
Node Level quota/limits: These settings control memory utilization on a node
shard_indexing_pressure.primary_parameter.node.soft_limit : Today Indexing Pressure permits maximum of 10% of JVM memory for indexing traffic on a node (as part of indexing_pressure.memory.limit setting) . Here, this new setting allows to define the percentage of this Node level memory utilization to act as a soft indicator for duress on the node. By default 70% of the node level limit is used as a Soft Threshold indicator of duress for Shard Indexing Pressure, and it is only then additional granular tracking metrics are brought into consideration to find any actual degradation in the write path.
Shard Level quota/limits : These settings control memory utilization on a shard
shard_indexing_pressure.primary_parameter.shard.min_limit : Minimum assigned quota for a new shard at any role (coordinator, primary and replica), when a write comes in for new shard. This allocated quota is then increased or decreased based upon the inflow of traffic targeted of the shard.
shard_indexing_pressure.operating_factor.lower : Lower occupancy limit within the allocated quota for the shard, below which the allocated quota for the shard is considered for a decrease. This is indicative of a decrease in the traffic for this shard. Default value is 75%, which implies if total utilisation of shard goes below 75% of its allocated quota, shard will be considered for a decrease in it current assigned quota.
shard_indexing_pressure.operating_factor.optimal : Desired occupancy of the shard level quota at any given point in time. Shard operating at its optimal range is not considered for increase or decrease it its quota. Current default value is 85%.
shard_indexing_pressure.operating_factor.upper : Upper occupancy limit within the allocated quota for the shard, beyond which the allocated quota for the shard is considered for an increase, as there is increase in the traffic for this shard. Default value is 95%, which implies if total utilisation of shard goes below 95% of its allocated quota shard will be considered for an increase in its current assigned quota.
Performance Degradation Factors/Levers: These settings control the dynamic performance thresholds for a shard
shard_indexing_pressure.secondary_parameter.throughput.request_size_window : This is the sampling window size for requests one a shard, where the recent requests performance thresholds are evaluated and measured against the current in-flight requests to determine any degradation. The default value is of 2000 request and is chosen considering small bootstrap time while also providing sufficient data samples to perform right action.
shard_indexing_pressure.secondary_parameter.throughput.degradation_factor : This is the degradation factor of request, for per unit byte of request. This is to determine the threshold for latency spikes. Default value is 5x, which implies if latency shoots up 5 times the historical view, it is considered for degradation.
shard_indexing_pressure.secondary_parameter.successful_request.elapsed_timeout : This is used to identify any black hole or stuck request scenarios, where new requests are continuously accepted but there is no sufficient completion. Default value is kept as 300000 ms, to identify stuck requests in the system.
shard_indexing_pressure.secondary_parameter.successful_request.max_outstanding_requests : This is to take into account, the total number of requests which should to be stuck (with elapsed_timout setting above). Combination of outstanding requests and timeout allows the system to flag it as a threshold breach. Default value for this is 100 outstanding requests.
The text was updated successfully, but these errors were encountered:
Meta Issues on the Problem and Approach : opensearch-project/OpenSearch#478
Dynamic Open Search Settings for Shard Indexing Backpressure feature
High Level Control Levers: These are feature level settings
shard_indexing_pressure.enable : To turn the shard indexing backpressure feature on or off. Flipping this value on the fly is seamless and does not affect the traffic workload. All granular level metric tracking part of shard-indexing-pressure will turn on and off based on this flag. Default value is
false
and will be turned out on as part of the incremental rollout in future releases.shard_indexing_pressure.enforced : To run the shard indexing backpressure feature in
ShadowMode
orEnforcedMode
once the feature is enabled. Flipping this value on the fly is seamless. In Shadow Mode (values set as “false”) all granular level metric tracking part of the feature will continue to work, however no actual rejections will happen. In EnforcedMode (values set as “true“) requests breaching the key performance thresholds will be rejected as well.Node Level quota/limits: These settings control memory utilization on a node
indexing_pressure.memory.limit
setting) . Here, this new setting allows to define the percentage of this Node level memory utilization to act as a soft indicator for duress on the node. By default 70% of the node level limit is used as a Soft Threshold indicator of duress for Shard Indexing Pressure, and it is only then additional granular tracking metrics are brought into consideration to find any actual degradation in the write path.Shard Level quota/limits : These settings control memory utilization on a shard
shard_indexing_pressure.primary_parameter.shard.min_limit : Minimum assigned quota for a new shard at any role (coordinator, primary and replica), when a write comes in for new shard. This allocated quota is then increased or decreased based upon the inflow of traffic targeted of the shard.
shard_indexing_pressure.operating_factor.lower : Lower occupancy limit within the allocated quota for the shard, below which the allocated quota for the shard is considered for a decrease. This is indicative of a decrease in the traffic for this shard. Default value is 75%, which implies if total utilisation of shard goes below 75% of its allocated quota, shard will be considered for a decrease in it current assigned quota.
shard_indexing_pressure.operating_factor.optimal : Desired occupancy of the shard level quota at any given point in time. Shard operating at its optimal range is not considered for increase or decrease it its quota. Current default value is 85%.
shard_indexing_pressure.operating_factor.upper : Upper occupancy limit within the allocated quota for the shard, beyond which the allocated quota for the shard is considered for an increase, as there is increase in the traffic for this shard. Default value is 95%, which implies if total utilisation of shard goes below 95% of its allocated quota shard will be considered for an increase in its current assigned quota.
Performance Degradation Factors/Levers: These settings control the dynamic performance thresholds for a shard
shard_indexing_pressure.secondary_parameter.throughput.request_size_window : This is the sampling window size for requests one a shard, where the recent requests performance thresholds are evaluated and measured against the current in-flight requests to determine any degradation. The default value is of 2000 request and is chosen considering small bootstrap time while also providing sufficient data samples to perform right action.
shard_indexing_pressure.secondary_parameter.throughput.degradation_factor : This is the degradation factor of request, for per unit byte of request. This is to determine the threshold for latency spikes. Default value is 5x, which implies if latency shoots up 5 times the historical view, it is considered for degradation.
shard_indexing_pressure.secondary_parameter.successful_request.elapsed_timeout : This is used to identify any black hole or stuck request scenarios, where new requests are continuously accepted but there is no sufficient completion. Default value is kept as 300000 ms, to identify stuck requests in the system.
shard_indexing_pressure.secondary_parameter.successful_request.max_outstanding_requests : This is to take into account, the total number of requests which should to be stuck (with
elapsed_timout
setting above). Combination of outstanding requests and timeout allows the system to flag it as a threshold breach. Default value for this is 100 outstanding requests.The text was updated successfully, but these errors were encountered: