Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Shard Indexing Pressure Settings Documentation #261

Closed
getsaurabh02 opened this issue Nov 11, 2021 · 1 comment
Closed

Update Shard Indexing Pressure Settings Documentation #261

getsaurabh02 opened this issue Nov 11, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@getsaurabh02
Copy link
Member

Meta Issues on the Problem and Approach : opensearch-project/OpenSearch#478

Dynamic Open Search Settings for Shard Indexing Backpressure feature

High Level Control Levers: These are feature level settings

  1. shard_indexing_pressure.enable : To turn the shard indexing backpressure feature on or off. Flipping this value on the fly is seamless and does not affect the traffic workload. All granular level metric tracking part of shard-indexing-pressure will turn on and off based on this flag. Default value is false and will be turned out on as part of the incremental rollout in future releases.

  2. shard_indexing_pressure.enforced : To run the shard indexing backpressure feature in ShadowMode or EnforcedMode once the feature is enabled. Flipping this value on the fly is seamless. In Shadow Mode (values set as “false”) all granular level metric tracking part of the feature will continue to work, however no actual rejections will happen. In EnforcedMode (values set as “true“) requests breaching the key performance thresholds will be rejected as well.

Node Level quota/limits: These settings control memory utilization on a node

  1. shard_indexing_pressure.primary_parameter.node.soft_limit : Today Indexing Pressure permits maximum of 10% of JVM memory for indexing traffic on a node (as part of indexing_pressure.memory.limit setting) . Here, this new setting allows to define the percentage of this Node level memory utilization to act as a soft indicator for duress on the node. By default 70% of the node level limit is used as a Soft Threshold indicator of duress for Shard Indexing Pressure, and it is only then additional granular tracking metrics are brought into consideration to find any actual degradation in the write path.

Shard Level quota/limits : These settings control memory utilization on a shard

  1. shard_indexing_pressure.primary_parameter.shard.min_limit : Minimum assigned quota for a new shard at any role (coordinator, primary and replica), when a write comes in for new shard. This allocated quota is then increased or decreased based upon the inflow of traffic targeted of the shard.

  2. shard_indexing_pressure.operating_factor.lower : Lower occupancy limit within the allocated quota for the shard, below which the allocated quota for the shard is considered for a decrease. This is indicative of a decrease in the traffic for this shard. Default value is 75%, which implies if total utilisation of shard goes below 75% of its allocated quota, shard will be considered for a decrease in it current assigned quota.

  3. shard_indexing_pressure.operating_factor.optimal : Desired occupancy of the shard level quota at any given point in time. Shard operating at its optimal range is not considered for increase or decrease it its quota. Current default value is 85%.

  4. shard_indexing_pressure.operating_factor.upper : Upper occupancy limit within the allocated quota for the shard, beyond which the allocated quota for the shard is considered for an increase, as there is increase in the traffic for this shard. Default value is 95%, which implies if total utilisation of shard goes below 95% of its allocated quota shard will be considered for an increase in its current assigned quota.

Performance Degradation Factors/Levers: These settings control the dynamic performance thresholds for a shard

  1. shard_indexing_pressure.secondary_parameter.throughput.request_size_window : This is the sampling window size for requests one a shard, where the recent requests performance thresholds are evaluated and measured against the current in-flight requests to determine any degradation. The default value is of 2000 request and is chosen considering small bootstrap time while also providing sufficient data samples to perform right action.

  2. shard_indexing_pressure.secondary_parameter.throughput.degradation_factor : This is the degradation factor of request, for per unit byte of request. This is to determine the threshold for latency spikes. Default value is 5x, which implies if latency shoots up 5 times the historical view, it is considered for degradation.

  3. shard_indexing_pressure.secondary_parameter.successful_request.elapsed_timeout : This is used to identify any black hole or stuck request scenarios, where new requests are continuously accepted but there is no sufficient completion. Default value is kept as 300000 ms, to identify stuck requests in the system.

  4. shard_indexing_pressure.secondary_parameter.successful_request.max_outstanding_requests : This is to take into account, the total number of requests which should to be stuck (with elapsed_timout setting above). Combination of outstanding requests and timeout allows the system to flag it as a threshold breach. Default value for this is 100 outstanding requests.

@getsaurabh02 getsaurabh02 added the enhancement New feature or request label Nov 11, 2021
@ashwinkumar12345 ashwinkumar12345 self-assigned this Nov 19, 2021
@ashwinkumar12345
Copy link
Contributor

Added - #327.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants