[Serve] Refactor `ReplicaQueueLengthAutoscalingPolicy` into `AutoscalingPolicyManager` and policy function #42242

GeneDer · 2024-01-08T21:27:29Z

Why are these changes needed?

Refactor the existing ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager to managing the lifecycle of policy call and applies bounds. AutoscalingPolicyManager also served as interface layer between the policy and the DeploymentState. Refactored the rest of core policy logics into replica_queue_length_autoscaling_policy policy function for use by AutoscalingPolicyManager.

Related issue number

Second PR for #41135

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Gene Su <e870252314@gmail.com>

python/ray/serve/_private/autoscaling_policy.py

Signed-off-by: Gene Su <e870252314@gmail.com>

alexeykudinkin · 2024-01-09T21:07:48Z

python/ray/serve/_private/autoscaling_policy.py

+
+    def get_decision_num_replicas(
+        self,
+        curr_target_num_replicas: int,


nit: Let's be consistent w/ naming -- either prefix all params w/ "curr" or "current"

👍 will rename in the upcoming PR

alexeykudinkin · 2024-01-09T21:13:13Z

python/ray/serve/_private/autoscaling_policy.py

+        if self.config:
+            self.policy = self.config.get_policy()
+
+    def should_autoscale(self) -> bool:


nit: I'd rather call this is_enabled, since we're checking whether APM is working rather whether we want to autoscale it

This is refactored from https://github.com/ray-project/ray/pull/42242/files#diff-c4c2583a4c2f3a3c87ada6faebd0b2dfef404e165df133a11195be5d65fcb387L1267 and keeping the naming. I feel we can change both places to is_autoscaling_policy_enabled to be more descriptive. What do you think?

python/ray/serve/autoscaling_policy.py

alexeykudinkin · 2024-01-09T22:40:12Z

python/ray/serve/autoscaling_policy.py

@@ -142,136 +107,59 @@ class ReplicaQueueLengthAutoscalingPolicy(AutoscalingPolicy):
    `get_decision_num_replicas` is called once every CONTROL_LOOP_PERIOD_S
    seconds.


Why do we need such assumption?

The logic for our default autoscaling policy has not changed and I think the comment is still suitable. Basically it's assuming controller's event loop runs around 0.1 seconds and use that to delay upscaling/ downscaling. I think we can follow up to clean up some of those logics later, but for the purpose of adding the custom autoscaling functionality I rather not touch this right now.

alexeykudinkin · 2024-01-09T22:47:56Z

python/ray/serve/autoscaling_policy.py

+    # Scale up.
+    if desired_num_replicas > curr_target_num_replicas:
+        # If the previous decision was to scale down (the counter was
+        # negative), we reset it and then increment it (set to 1).
+        # Otherwise, just increment.
+        if decision_counter < 0:
+            decision_counter = 0
+        decision_counter += 1
+
+        # Only actually scale the replicas if we've made this decision for
+        # 'scale_up_consecutive_periods' in a row.
+        if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_PERIOD_S):
+            decision_counter = 0
+            decision_num_replicas = desired_num_replicas
+
+    # Scale down.
+    elif desired_num_replicas < curr_target_num_replicas:
+        # If the previous decision was to scale up (the counter was
+        # positive), reset it to zero before decrementing.
+        if decision_counter > 0:
+            decision_counter = 0
+        decision_counter -= 1
+
+        # Only actually scale the replicas if we've made this decision for
+        # 'scale_down_consecutive_periods' in a row.
+        if decision_counter < -int(config.downscale_delay_s / CONTROL_LOOP_PERIOD_S):
+            decision_counter = 0
+            decision_num_replicas = desired_num_replicas
+
+    # Do nothing.
+    else:
+        decision_counter = 0


This should be deferred to PolicyManager to whether accept, reject or delay scaling recommendation produced by the policy. Policy by itself should be stateless, though could be accepting historical data (say last 5min of rersource usage) to take its decision.

Totally agreed the policy itself should be stateless and that's why it's been refactored to just a function. However, I don't think the policy manager should decide how fast or slow it the replica scales. There might be usecase that the user is tracking some sort of session count on the main page and wanting to warm up/ cool down the ML service by scaling the deployment replica up/ down accordingly. I don't think we should stop customer from quickly scale up/ down by moving this logics out.

Yes, policy should be able to control how fast we auto-scale, but policy shouldn't be relying on the frequency of its invocation (CONTROL_LOOP_PERIOD_S)

Agreed, this is currently out of scope of the custom autoscaling project, but we can follow up to rewrite this logics to use the exact time when autoscale last happened :)

We can persist history w/in policy_state for ex
#42284 (comment)

…ingPolicyManager` and policy function (ray-project#42242) Refactor the existing ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager to managing the lifecycle of policy call and applies bounds. AutoscalingPolicyManager also served as interface layer between the policy and the DeploymentState. Refactored the rest of core policy logics into replica_queue_length_autoscaling_policy policy function for use by AutoscalingPolicyManager. --------- Signed-off-by: Gene Su <e870252314@gmail.com>

[Serve] add autoscaling policy manager and refactor default policy

a07b284

Signed-off-by: Gene Su <e870252314@gmail.com>

GeneDer self-assigned this Jan 8, 2024

fix thread manager call and drop autoscaling context

47a1cea

Signed-off-by: Gene Su <e870252314@gmail.com>

GeneDer changed the title ~~[Serve] add autoscaling policy manager and refactor default policy~~ [Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and replica_queue_length_autoscaling_policy function Jan 8, 2024

GeneDer changed the title ~~[Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and replica_queue_length_autoscaling_policy function~~ [Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and policy function Jan 8, 2024

little fixups

ca3e62a

Signed-off-by: Gene Su <e870252314@gmail.com>

GeneDer requested a review from a team January 8, 2024 22:28

GeneDer marked this pull request as ready for review January 8, 2024 22:29

edoakes approved these changes Jan 9, 2024

View reviewed changes

python/ray/serve/_private/autoscaling_policy.py Outdated Show resolved Hide resolved

fix policy_state naming

e30c9c6

Signed-off-by: Gene Su <e870252314@gmail.com>

edoakes merged commit 455b5f3 into ray-project:master Jan 9, 2024
9 checks passed

GeneDer deleted the add-autoscaling-manager branch January 9, 2024 22:18

alexeykudinkin reviewed Jan 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Refactor `ReplicaQueueLengthAutoscalingPolicy` into `AutoscalingPolicyManager` and policy function #42242

[Serve] Refactor `ReplicaQueueLengthAutoscalingPolicy` into `AutoscalingPolicyManager` and policy function #42242

GeneDer commented Jan 8, 2024 •

edited

Loading

alexeykudinkin Jan 9, 2024

GeneDer Jan 9, 2024

alexeykudinkin Jan 9, 2024

GeneDer Jan 9, 2024

alexeykudinkin Jan 9, 2024

GeneDer Jan 9, 2024

alexeykudinkin Jan 9, 2024

GeneDer Jan 9, 2024

alexeykudinkin Jan 11, 2024

GeneDer Jan 11, 2024

alexeykudinkin Jan 11, 2024

		@@ -142,136 +107,59 @@ class ReplicaQueueLengthAutoscalingPolicy(AutoscalingPolicy):
		`get_decision_num_replicas` is called once every CONTROL_LOOP_PERIOD_S
		seconds.

[Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and policy function #42242

[Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and policy function #42242

Conversation

GeneDer commented Jan 8, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[Serve] Refactor `ReplicaQueueLengthAutoscalingPolicy` into `AutoscalingPolicyManager` and policy function #42242

[Serve] Refactor `ReplicaQueueLengthAutoscalingPolicy` into `AutoscalingPolicyManager` and policy function #42242

GeneDer commented Jan 8, 2024 •

edited

Loading