-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added resource usage trackers for in-flight cancellation of SearchShardTask #4805
Added resource usage trackers for in-flight cancellation of SearchShardTask #4805
Conversation
Gradle Check (Jenkins) Run Completed with:
|
831a425
to
87923a9
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
87923a9
to
7719d71
Compare
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/search/backpressure/trackers/ElapsedTimeTracker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/backpressure/trackers/ElapsedTimeTracker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/backpressure/trackers/ElapsedTimeTracker.java
Outdated
Show resolved
Hide resolved
ae3269d
to
c42859e
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
c42859e
to
aeb6521
Compare
Gradle Check (Jenkins) Run Completed with:
|
aeb6521
to
0dc3749
Compare
Gradle Check (Jenkins) Run Completed with:
|
0dc3749
to
5cbfa8a
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle check is repeatedly failing with the same test failures in MixedClusterClientYamlTestSuiteIT. This seems to be happening after pulling the last commit 515f84b from the 'main' branch to avoid merge-conflicts. Gradle check also failed with the same test failures for this commit (https://build.ci.opensearch.org/job/gradle-check/5078/). Related to this issue - #4852 Update: This issue has been resolved with 49a9b81. |
5cbfa8a
to
de64339
Compare
Gradle Check (Jenkins) Run Completed with:
|
de64339
to
a00e440
Compare
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #4805 +/- ##
============================================
- Coverage 71.37% 70.77% -0.60%
+ Complexity 58338 57879 -459
============================================
Files 4689 4692 +3
Lines 277022 276959 -63
Branches 40315 40301 -14
============================================
- Hits 197718 196018 -1700
- Misses 63304 64680 +1376
- Partials 16000 16261 +261
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
…rdTask 1. CpuUsageTracker: cancels tasks if they consume too much CPU 2. ElapsedTimeTracker: cancels tasks if they consume too much time 3. HeapUsageTracker: cancels tasks if they consume too much heap Signed-off-by: Ketan Verma <ketan9495@gmail.com>
a00e440
to
98610e3
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
98610e3
to
962cc05
Compare
Gradle Check (Jenkins) Run Completed with:
|
@@ -213,7 +215,7 @@ TaskCancellation getTaskCancellation(CancellableTask task) { | |||
List<Runnable> callbacks = new ArrayList<>(); | |||
|
|||
for (TaskResourceUsageTracker tracker : taskResourceUsageTrackers) { | |||
Optional<TaskCancellation.Reason> reason = tracker.cancellationReason(task); | |||
Optional<TaskCancellation.Reason> reason = tracker.checkAndMaybeGetCancellationReason(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, but in general we should think about decoupling tracking and action once thresholds have breached. Today it might be search cancellation but I do envision this as an action that modifies threadpool size/queue in a manner that creates a backpressure
We can think about that refactor as a fast follow up as that will help us add more actions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also cancellation isn't truly back pressure :) it's load shedding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Trackers can recommend actions once thresholds are met, and cancellation of tasks can be one such action. This will however influence how dissimilar actions from different trackers are grouped/compared with each other in the SearchBackpressureService
. For example, we need to aggregate the cancellation scores from each tracker before we start cancelling tasks. With generic actions, this might become really complicated.
Let's do a detailed design of this first and refactor as a follow-up. Enhancement: #4985
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/search/backpressure/settings/SearchShardTaskSettings.java
Outdated
Show resolved
Hide resolved
public abstract Optional<TaskCancellation.Reason> cancellationReason(Task task); | ||
public abstract Optional<TaskCancellation.Reason> checkAndMaybeGetCancellationReason(Task task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature still doesn't look right, it doesn't clarify if cancellation will occur or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel that it's correct – this only returns a TaskCancellation.Reason
for a task eligible for cancellation. It doesn't say anything about whether the task will be actually cancelled or not.
6e9e7a6
to
57667ef
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle checks are failing due to a missing dependency JAR issue.
It looks like the JAR has vanished from this repo: https://repo.gradle.org/ui/native/jcenter/com/diffplug/spotless/spotless-lib/2.30.0/ Related: #4987 |
57667ef
to
e0ca2c8
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
e0ca2c8
to
442ff82
Compare
Gradle Check (Jenkins) Run Completed with:
|
…on of SearchShardTask (opensearch-project#4805) 1. CpuUsageTracker: cancels tasks if they consume too much CPU 2. ElapsedTimeTracker: cancels tasks if they consume too much time 3. HeapUsageTracker: cancels tasks if they consume too much heap Signed-off-by: Ketan Verma <ketan9495@gmail.com>
…on of SearchShardTask (opensearch-project#4805) 1. CpuUsageTracker: cancels tasks if they consume too much CPU 2. ElapsedTimeTracker: cancels tasks if they consume too much time 3. HeapUsageTracker: cancels tasks if they consume too much heap Signed-off-by: Ketan Verma <ketan9495@gmail.com>
…ource consumption (#5039) * [Backport 2.x] Added in-flight cancellation of SearchShardTask based on resource consumption (#4575) This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the cluster performance. * [Backport 2.x] Added resource usage trackers for in-flight cancellation of SearchShardTask (#4805) 1. CpuUsageTracker: cancels tasks if they consume too much CPU 2. ElapsedTimeTracker: cancels tasks if they consume too much time 3. HeapUsageTracker: cancels tasks if they consume too much heap * [Backport 2.x]Added search backpressure stats API Added search backpressure stats to the existing node/stats API to describe: 1. the number of cancellations (currently for SearchShardTask only) 2. the current state of TaskResourceUsageTracker Signed-off-by: Ketan Verma <ketan9495@gmail.com>
…ource consumption (opensearch-project#5039) * [Backport 2.x] Added in-flight cancellation of SearchShardTask based on resource consumption (opensearch-project#4575) This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the cluster performance. * [Backport 2.x] Added resource usage trackers for in-flight cancellation of SearchShardTask (opensearch-project#4805) 1. CpuUsageTracker: cancels tasks if they consume too much CPU 2. ElapsedTimeTracker: cancels tasks if they consume too much time 3. HeapUsageTracker: cancels tasks if they consume too much heap * [Backport 2.x]Added search backpressure stats API Added search backpressure stats to the existing node/stats API to describe: 1. the number of cancellations (currently for SearchShardTask only) 2. the current state of TaskResourceUsageTracker Signed-off-by: Ketan Verma <ketan9495@gmail.com>
…ource consumption (#5039) * [Backport 2.x] Added in-flight cancellation of SearchShardTask based on resource consumption (#4575) This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the cluster performance. * [Backport 2.x] Added resource usage trackers for in-flight cancellation of SearchShardTask (#4805) 1. CpuUsageTracker: cancels tasks if they consume too much CPU 2. ElapsedTimeTracker: cancels tasks if they consume too much time 3. HeapUsageTracker: cancels tasks if they consume too much heap * [Backport 2.x]Added search backpressure stats API Added search backpressure stats to the existing node/stats API to describe: 1. the number of cancellations (currently for SearchShardTask only) 2. the current state of TaskResourceUsageTracker Signed-off-by: Ketan Verma <ketan9495@gmail.com> (cherry picked from commit 7c521b9)
…ource consumption (#5039) (#5058) * [Backport 2.x] Added in-flight cancellation of SearchShardTask based on resource consumption (#4575) This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the cluster performance. * [Backport 2.x] Added resource usage trackers for in-flight cancellation of SearchShardTask (#4805) 1. CpuUsageTracker: cancels tasks if they consume too much CPU 2. ElapsedTimeTracker: cancels tasks if they consume too much time 3. HeapUsageTracker: cancels tasks if they consume too much heap * [Backport 2.x]Added search backpressure stats API Added search backpressure stats to the existing node/stats API to describe: 1. the number of cancellations (currently for SearchShardTask only) 2. the current state of TaskResourceUsageTracker Signed-off-by: Ketan Verma <ketan9495@gmail.com> (cherry picked from commit 7c521b9) Co-authored-by: Ketan Verma <ketanv3@users.noreply.github.com>
Description
The following implementations of
TaskResourceUsageTracker
have been added:Issues Resolved
#1181
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Signed-off-by: Ketan Verma ketan9495@gmail.com