-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[1/n] add application level autoscaling policy in schema #57535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: abrar <abrar@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the autoscaling policy configuration by moving the serialization logic from AutoscalingConfig into AutoscalingPolicy. It also introduces an application-level autoscaling_policy in ServeApplicationSchema. The changes are logical and improve encapsulation.
I've identified a critical issue in the protobuf schema update that breaks wire compatibility, and a medium-severity issue regarding schema definition in ServeApplicationSchema. Please see the detailed comments.
| // The autoscaling policy definition. | ||
| AutoscalingPolicy policy = 12; | ||
| AutoscalingPolicy policy = 11; | ||
|
|
||
| // Target number of in flight requests per replica. This is the primary configuration | ||
| // knob for replica autoscaler. Lower the number, the more rapidly the replicas | ||
| // scales up. Must be a non-negative integer. | ||
| double target_ongoing_requests = 13; | ||
| double target_ongoing_requests = 12; | ||
|
|
||
| // The multiplicative "gain" factor to limit upscale. | ||
| optional double upscaling_factor = 14; | ||
| optional double upscaling_factor = 13; | ||
|
|
||
| // The multiplicative "gain" factor to limit downscale. | ||
| optional double downscaling_factor = 15; | ||
| optional double downscaling_factor = 14; | ||
|
|
||
| // How long to wait before scaling down replicas from 1 to 0 | ||
| optional double downscale_to_zero_delay_s = 16; | ||
| optional double downscale_to_zero_delay_s = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renumbering protobuf fields is a wire-incompatible breaking change. An older component serializing an AutoscalingConfig message will use field numbers 12, 13, etc., for policy, target_ongoing_requests, etc. A newer component will interpret these as target_ongoing_requests, upscaling_factor, etc., leading to silent data corruption or deserialization errors.
To maintain compatibility, you should not renumber the existing fields. Please revert the field numbers to their original values. It's also a good practice to mark the removed field number as reserved to prevent accidental reuse, like this:
...
optional double downscale_smoothing_factor = 10;
reserved 11;
// The autoscaling policy definition.
AutoscalingPolicy policy = 12;
...| // The autoscaling policy definition. | |
| AutoscalingPolicy policy = 12; | |
| AutoscalingPolicy policy = 11; | |
| // Target number of in flight requests per replica. This is the primary configuration | |
| // knob for replica autoscaler. Lower the number, the more rapidly the replicas | |
| // scales up. Must be a non-negative integer. | |
| double target_ongoing_requests = 13; | |
| double target_ongoing_requests = 12; | |
| // The multiplicative "gain" factor to limit upscale. | |
| optional double upscaling_factor = 14; | |
| optional double upscaling_factor = 13; | |
| // The multiplicative "gain" factor to limit downscale. | |
| optional double downscaling_factor = 15; | |
| optional double downscaling_factor = 14; | |
| // How long to wait before scaling down replicas from 1 to 0 | |
| optional double downscale_to_zero_delay_s = 16; | |
| optional double downscale_to_zero_delay_s = 15; | |
| // The autoscaling policy definition. | |
| AutoscalingPolicy policy = 12; | |
| // Target number of in flight requests per replica. This is the primary configuration | |
| // knob for replica autoscaler. Lower the number, the more rapidly the replicas | |
| // scales up. Must be a non-negative integer. | |
| double target_ongoing_requests = 13; | |
| // The multiplicative "gain" factor to limit upscale. | |
| optional double upscaling_factor = 14; | |
| // The multiplicative "gain" factor to limit downscale. | |
| optional double downscaling_factor = 15; | |
| // How long to wait before scaling down replicas from 1 to 0 | |
| optional double downscale_to_zero_delay_s = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we shouldn't numbers assigned to existing fields, this can make the code backward incompatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a problem because we dont store these proto records in DB and use it in different session
python/ray/serve/schema.py
Outdated
| autoscaling_policy: dict = Field( | ||
| default={}, | ||
| description=( | ||
| "Autoscaling policy for the application. " | ||
| "If null, serve fallbacks to autoscaling policy in each deployment. " | ||
| "This option is under development and not yet supported." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint and default value for autoscaling_policy are inconsistent with its description. The description mentions "If null...", which implies the field should be optional and can be None.
- The type hint should be
Optional[dict]to allowNoneas a value. - The default value should be
Noneto match the "if null" condition described. - Using
default={}creates a mutable default value, which can lead to unexpected behavior. It's better to usedefault=Noneordefault_factory=dict. In this case,default=Noneis the most clear and correct choice based on the description.
| autoscaling_policy: dict = Field( | |
| default={}, | |
| description=( | |
| "Autoscaling policy for the application. " | |
| "If null, serve fallbacks to autoscaling policy in each deployment. " | |
| "This option is under development and not yet supported." | |
| ), | |
| ) | |
| autoscaling_policy: Optional[dict] = Field( | |
| default=None, | |
| description=( | |
| "Autoscaling policy for the application. " | |
| "If null, serve fallbacks to autoscaling policy in each deployment. " | |
| "This option is under development and not yet supported." | |
| ), | |
| ) |
|
@abrarsheikh merge conflicts |
Signed-off-by: abrar <abrar@anyscale.com>
|
LGTM |
part 1 of #56149
_serialized_policy_defintoAutoscalingPolicyfromAutoscalingConfig. We need this in order to reuseAutoscalingPolicyfor application-level autoscaling.autoscaling_policya top-level config inServeApplicationSchema.