Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shard_indexing_pressure for smart rejections of indexing requests #480

Conversation

getsaurabh02
Copy link
Member

@getsaurabh02 getsaurabh02 commented Apr 1, 2021

Shard Indexing Pressure introduces smart rejections of indexing requests when there are too many stuck/slow requests in the cluster, breaching key performance thresholds. This prevents the nodes in cluster to run into cascading effects of failures. (#478) [WIP]

Co-authored-by: Dharmesh Singh sdharms@amazon.com

Description

With shard level indexing pressure we want to improve the current Indexing Pressure framework which performs memory accounting at node level and rejects the requests. We aim to take a step further to have rejections based on the memory accounting at shard level along with other key performance factors like throughput and last successful requests. This can be called as ShardIndexingPressure.

Issues Resolved

Closes #478

Check List [WIP]

  • [+] New functionality includes testing.
    • [] All tests pass
  • [+] New functionality has been documented.
    • [] New functionality has javadoc added
  • [+] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… based on key performance thresholds. (#478)

Signed-off-by: Saurabh Singh <sisurab@amazon.com>
@odfe-release-bot
Copy link

✅   DCO Check Passed 2307ebf

@odfe-release-bot
Copy link

✅   Gradle Wrapper Validation success 2307ebf

@odfe-release-bot
Copy link

❌   Gradle Precommit failure 2307ebf
Log 72

@nknize nknize added enhancement Enhancement or improvement to existing feature or request v1.0.0-alpha1 Version 1.0.0 alpha 1 opendistro-port Features ported from OpenDistro labels Apr 5, 2021
@nknize nknize self-requested a review April 5, 2021 02:25
Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive PR and going to be difficult for folks to thoroughly review. Can we convert #478 into a meta issue and split this into separate smaller incremental PR's by the feature list below:

  • Granular tracking of indexing tasks performance, at every Shard level, for each Node role i.e. coordinator, primary and replica.
  • Smarter rejections by discarding the requests intended only for problematic index or shard, while still allowing others to continue (fairness in rejection).
  • Rejections thresholds governed by combination of configurable parameters (such as memory limits on node) and dynamic parameters (such as latency increase, throughput degradation).
  • Node level and Shard level indexing pressure statistics exposed through stats api.
  • Integration of Indexing pressure stats with Plugins for for metric visibility and auto-tuning in future.
  • Control knobs to tune to the key performance thresholds which control rejections, to address any specific requirement or issues.
  • Control knobs to run the feature in Shadow-Mode or Enforced-Mode. In shadow-mode only internal rejection breakdown metrics will be published while no actual rejections will be performed.

@getsaurabh02
Copy link
Member Author

getsaurabh02 commented Apr 5, 2021

Hi @nknize I have broken this PR into 4 logical PRs now as below :

  1. Add framework level constructs to track shard indexing pressure. (Add framework level constructs to track shard indexing pressure. #496)
  2. Add plumbing logic to invoke shard indexing pressure during write operation. (Add plumbing logic to invoke shard indexing pressure during write operation. #497)
  3. Add shard indexing pressure metric/stats via rest end point (Add shard indexing pressure metric/stats via rest end point #498)
  4. Add shard indexing pressure IT ( Add shard indexing pressure IT #499)

This should allow reviewers to gradually develop the context for the change. Also, it is not possible to break this down further and let build/tests along with precommit to succeed, without significant code removal and addition. The references and imports will break the build otherwise. As part of the port we are aiming to get this change out quick with the first release.

Each PR above is built on top of the previous PR commit to allow build/precommit to succeed. Hence reviewers are requested to look only at the last commit of each PR for review. Have updated details in each PR accordingly to avoid confustion. Once these PRs starts getting merged (main), will update the commits in the subsequent PR to have only relevant changes.

Please feel free to close this PR in favour of the 4 new PRs now.

@peterzhuamazon
Copy link
Member

start gradle precommit
start dco check
start wrapper validation

@odfe-release-bot
Copy link

✅   Gradle Wrapper Validation success 2307ebf

@odfe-release-bot
Copy link

✅   DCO Check Passed 2307ebf

@peterzhuamazon
Copy link
Member

start gradle precommit

@odfe-release-bot
Copy link

❌   Gradle Precommit failure 2307ebf
Log 247

@nknize nknize added v2.0.0 Version 2.0.0 and removed v1.0.0-alpha1 Version 1.0.0 alpha 1 labels Jun 24, 2021
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 2307ebf

@opensearch-ci-bot
Copy link
Collaborator

✅   DCO Check Passed 2307ebf

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Precommit failure 2307ebf
Log 738

@getsaurabh02
Copy link
Member Author

Closing this as per the plan updated in #478

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request opendistro-port Features ported from OpenDistro v2.0.0 Version 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Meta] Shard level Indexing Back-Pressure
5 participants