-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make balanced shards allocator timebound #15239
Make balanced shards allocator timebound #15239
Conversation
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
❌ Gradle check result for 0e9151c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
❌ Gradle check result for 7fe10d7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
❌ Gradle check result for 8f558d7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
❌ Gradle check result for 9a101b9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
...c/main/java/org/opensearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opensearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, lets add an integ test for the same with deterministic mechanisms to trigger timeouts
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
❌ Gradle check result for 72a10b4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
❕ Gradle check result for c18d44c: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an explicit reroute?
Usually, if in the current round of reroute any amount of work was attempted (like allocating an unassigned shard, or moving a shard or rebalancing a shard), a reroute will eventually be triggered. If no work was done (allocating no unassigned shards, moving no shards, rebalancing no shards) a subsequent reroute would also be most probably wasteful. But there could be edge cases where a subsequent reroute might help. Opened issue to track this - #14945 |
* Make balanced shards allocator time bound to prioritise critical operations waiting in the pending task queue Signed-off-by: Rishab Nahata <rnnahata@amazon.com> (cherry picked from commit e982a16) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Make balanced shards allocator time bound to prioritise critical operations waiting in the pending task queue Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
* Make balanced shards allocator time bound to prioritise critical operations waiting in the pending task queue Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
* Make balanced shards allocator time bound to prioritise critical operations waiting in the pending task queue Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
* Make balanced shards allocator time bound to prioritise critical operations waiting in the pending task queue Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Description
This PR aims to time bound the reroute duration to finish within a specific timeout so that it allows for URGENT priority tasks that would otherwise be waiting in queue.
For instance time taken by rebalance -
Hot threads in master -
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.