-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shard_indexing_pressure for smart rejections of indexing requests #480
Add shard_indexing_pressure for smart rejections of indexing requests #480
Conversation
… based on key performance thresholds. (#478) Signed-off-by: Saurabh Singh <sisurab@amazon.com>
✅ DCO Check Passed 2307ebf |
✅ Gradle Wrapper Validation success 2307ebf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a massive PR and going to be difficult for folks to thoroughly review. Can we convert #478 into a meta issue and split this into separate smaller incremental PR's by the feature list below:
- Granular tracking of indexing tasks performance, at every Shard level, for each Node role i.e. coordinator, primary and replica.
- Smarter rejections by discarding the requests intended only for problematic index or shard, while still allowing others to continue (fairness in rejection).
- Rejections thresholds governed by combination of configurable parameters (such as memory limits on node) and dynamic parameters (such as latency increase, throughput degradation).
- Node level and Shard level indexing pressure statistics exposed through stats api.
- Integration of Indexing pressure stats with Plugins for for metric visibility and auto-tuning in future.
- Control knobs to tune to the key performance thresholds which control rejections, to address any specific requirement or issues.
- Control knobs to run the feature in Shadow-Mode or Enforced-Mode. In shadow-mode only internal rejection breakdown metrics will be published while no actual rejections will be performed.
Hi @nknize I have broken this PR into 4 logical PRs now as below :
This should allow reviewers to gradually develop the context for the change. Also, it is not possible to break this down further and let build/tests along with precommit to succeed, without significant code removal and addition. The references and imports will break the build otherwise. As part of the port we are aiming to get this change out quick with the first release. Each PR above is built on top of the previous PR commit to allow build/precommit to succeed. Hence reviewers are requested to look only at the last commit of each PR for review. Have updated details in each PR accordingly to avoid confustion. Once these PRs starts getting merged (main), will update the commits in the subsequent PR to have only relevant changes. Please feel free to close this PR in favour of the 4 new PRs now. |
start gradle precommit |
✅ Gradle Wrapper Validation success 2307ebf |
✅ DCO Check Passed 2307ebf |
start gradle precommit |
✅ Gradle Wrapper Validation success 2307ebf |
✅ DCO Check Passed 2307ebf |
Closing this as per the plan updated in #478 |
Shard Indexing Pressure introduces smart rejections of indexing requests when there are too many stuck/slow requests in the cluster, breaching key performance thresholds. This prevents the nodes in cluster to run into cascading effects of failures. (#478) [WIP]
Co-authored-by: Dharmesh Singh sdharms@amazon.com
Description
With shard level indexing pressure we want to improve the current Indexing Pressure framework which performs memory accounting at node level and rejects the requests. We aim to take a step further to have rejections based on the memory accounting at shard level along with other key performance factors like throughput and last successful requests. This can be called as ShardIndexingPressure.
Issues Resolved
Closes #478
Check List [WIP]
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.