Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and policy function #42242

Merged
merged 4 commits into from
Jan 9, 2024

Conversation

GeneDer
Copy link
Contributor

@GeneDer GeneDer commented Jan 8, 2024

Why are these changes needed?

Refactor the existing ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager to managing the lifecycle of policy call and applies bounds. AutoscalingPolicyManager also served as interface layer between the policy and the DeploymentState. Refactored the rest of core policy logics into replica_queue_length_autoscaling_policy policy function for use by AutoscalingPolicyManager.

Related issue number

Second PR for #41135

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Gene Su <e870252314@gmail.com>
@GeneDer GeneDer self-assigned this Jan 8, 2024
Signed-off-by: Gene Su <e870252314@gmail.com>
@GeneDer GeneDer changed the title [Serve] add autoscaling policy manager and refactor default policy [Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and replica_queue_length_autoscaling_policy function Jan 8, 2024
@GeneDer GeneDer changed the title [Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and replica_queue_length_autoscaling_policy function [Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and policy function Jan 8, 2024
Signed-off-by: Gene Su <e870252314@gmail.com>
@GeneDer GeneDer requested a review from a team January 8, 2024 22:28
@GeneDer GeneDer marked this pull request as ready for review January 8, 2024 22:29
python/ray/serve/_private/autoscaling_policy.py Outdated Show resolved Hide resolved
Signed-off-by: Gene Su <e870252314@gmail.com>
@edoakes edoakes merged commit 455b5f3 into ray-project:master Jan 9, 2024
9 checks passed
@GeneDer GeneDer deleted the add-autoscaling-manager branch January 9, 2024 22:18

def get_decision_num_replicas(
self,
curr_target_num_replicas: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's be consistent w/ naming -- either prefix all params w/ "curr" or "current"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will rename in the upcoming PR

if self.config:
self.policy = self.config.get_policy()

def should_autoscale(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd rather call this is_enabled, since we're checking whether APM is working rather whether we want to autoscale it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is refactored from https://github.com/ray-project/ray/pull/42242/files#diff-c4c2583a4c2f3a3c87ada6faebd0b2dfef404e165df133a11195be5d65fcb387L1267 and keeping the naming. I feel we can change both places to is_autoscaling_policy_enabled to be more descriptive. What do you think?

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved
@@ -142,136 +107,59 @@ class ReplicaQueueLengthAutoscalingPolicy(AutoscalingPolicy):
`get_decision_num_replicas` is called once every CONTROL_LOOP_PERIOD_S
seconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need such assumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for our default autoscaling policy has not changed and I think the comment is still suitable. Basically it's assuming controller's event loop runs around 0.1 seconds and use that to delay upscaling/ downscaling. I think we can follow up to clean up some of those logics later, but for the purpose of adding the custom autoscaling functionality I rather not touch this right now.

Comment on lines +128 to +159
# Scale up.
if desired_num_replicas > curr_target_num_replicas:
# If the previous decision was to scale down (the counter was
# negative), we reset it and then increment it (set to 1).
# Otherwise, just increment.
if decision_counter < 0:
decision_counter = 0
decision_counter += 1

# Only actually scale the replicas if we've made this decision for
# 'scale_up_consecutive_periods' in a row.
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_PERIOD_S):
decision_counter = 0
decision_num_replicas = desired_num_replicas

# Scale down.
elif desired_num_replicas < curr_target_num_replicas:
# If the previous decision was to scale up (the counter was
# positive), reset it to zero before decrementing.
if decision_counter > 0:
decision_counter = 0
decision_counter -= 1

# Only actually scale the replicas if we've made this decision for
# 'scale_down_consecutive_periods' in a row.
if decision_counter < -int(config.downscale_delay_s / CONTROL_LOOP_PERIOD_S):
decision_counter = 0
decision_num_replicas = desired_num_replicas

# Do nothing.
else:
decision_counter = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be deferred to PolicyManager to whether accept, reject or delay scaling recommendation produced by the policy. Policy by itself should be stateless, though could be accepting historical data (say last 5min of rersource usage) to take its decision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed the policy itself should be stateless and that's why it's been refactored to just a function. However, I don't think the policy manager should decide how fast or slow it the replica scales. There might be usecase that the user is tracking some sort of session count on the main page and wanting to warm up/ cool down the ML service by scaling the deployment replica up/ down accordingly. I don't think we should stop customer from quickly scale up/ down by moving this logics out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, policy should be able to control how fast we auto-scale, but policy shouldn't be relying on the frequency of its invocation (CONTROL_LOOP_PERIOD_S)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is currently out of scope of the custom autoscaling project, but we can follow up to rewrite this logics to use the exact time when autoscale last happened :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can persist history w/in policy_state for ex
#42284 (comment)

vickytsang pushed a commit to ROCm/ray that referenced this pull request Jan 12, 2024
…ingPolicyManager` and policy function (ray-project#42242)

Refactor the existing ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager to managing the lifecycle of policy call and applies bounds. AutoscalingPolicyManager also served as interface layer between the policy and the DeploymentState. Refactored the rest of core policy logics into replica_queue_length_autoscaling_policy policy function for use by AutoscalingPolicyManager.

---------

Signed-off-by: Gene Su <e870252314@gmail.com>
zcin pushed a commit to zcin/ray that referenced this pull request Feb 6, 2024
…ingPolicyManager` and policy function (ray-project#42242)

Refactor the existing ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager to managing the lifecycle of policy call and applies bounds. AutoscalingPolicyManager also served as interface layer between the policy and the DeploymentState. Refactored the rest of core policy logics into replica_queue_length_autoscaling_policy policy function for use by AutoscalingPolicyManager.

---------

Signed-off-by: Gene Su <e870252314@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants