-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added AEP for multi-dimensional pod autoscaler #5342
Conversation
|
Welcome @James-QiuHaoran! |
Hi @James-QiuHaoran |
Hi @Shubham82 Thank you for the reminder. I signed the CLA just now as an individual, as my university (UIUC) appears to not have a manager managing CLA. |
9e484ad
to
9796d2f
Compare
a2e9772
to
6b7c849
Compare
fa9a275
to
e0786ef
Compare
Please briefly explain that's the plan for calculating recommendations. I think we should at least briefly describe how we're going to calculate recommendations (HPA based on CPU / configured metrics + VPA for memory?) |
#### `MultidimPodAutoscalerSpec` | ||
|
||
``` | ||
# MultidimPodAutoscalerSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only "spec" part of this corresponds to MultidimPodAutoscalerSpec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let me know if there's anything to clarify here.
kind: MultidimPodAutoscaler | ||
metadata: | ||
name: my-autoscaler | ||
spec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultidimPodAutoscalerSpec has somewhat different fields than what you describe here (for example here updatePolicy
is directly in MultidimPodAutoscalerSpec
, the linked doc its in Policy. I'd rather keep differences minimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's currently following the VerticalPodAutoscalerSpec style with updatePolicy
directly in VerticalPodAutoscalerSpec
. Let me know if it's necessary to have it in a wrapper Policy
.
|
||
#### `MultidimPodAutoscalerSpec` | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide this in the same format as MultidimPodAutoscalerStatus
below (for consistency and to make API easier to understand).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
constraints: | ||
minReplicas: min-num-replicas | ||
maxReplicas: max-num-replicas | ||
applicationSLA: value # customizable SLA for application metrics such as latency or throughput |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what does SLA specification look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g., if the SLA used is latency, then it could be something like 100ms. And the replica-set can be horizontally scaled when the measured latency is greater than 100ms, i.e., the SLA value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds to me applicationSLA
is only useful for a particular recommender that is SLA-based. Since we're proposing MPA API supporting various recommenders I believe the applicationSLA
belongs as config options of a concrete recommender rather than common API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the default recommender implemented as part of this AEP going to use the applicationSLA
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default recommender implemented in this AEP will not use the applicationSLA
field and this field will be used by users who want to implement their own recommender. E.g., an RL/ML-based recommender can have applicationSLA
as part of the reward/loss function and thus they can have extra constraints in addition to min/max replicas.
The applicationSLA
field is part of constraints
for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what is type of this field. Is it a string? If so what does the string represent? Does it point to a metric? Or is it a struct? If it's a struct I think we should describe what is in the struct. Or maybe it's something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was initially designed simply to be a floating point number (most application metrics like latency and throughput are floating point numbers). Let's say an application has a latency SLA of 100ms, but the actual latency is 120.8ms; then the recommender can suggest a scale-out option. I think it should cover most use cases but I'd like to learn from you about more potential production use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The applicationSLA field is part of constraints for simplicity.
Well, it’s going to be simple for users of the particular proprietary recommender that supports this field, rather confusing for other users that need to ignore it ;) Worse if someone else comes up with their slo-based autoscaling that requires slightly different set (or format) of constraints. I still think this field belongs to an ‘options’ map under ‘recommender’ field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to be nit picky, I think this is more of an SLO than SLA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced SLO with SLA. I'll leave the field as optional under 'constraints' and see if there are more use cases to support moving it under the 'recommender' field. Thanks!
|
||
<!-- - <test>: <link to test coverage> --> | ||
|
||
Integration tests are to be added in the beta version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have some basic tests to make sure the feature works
Added brief explanation. As an initial solution, VPA and HPA are performed alternatively in each iteration: VPA adjusts the CPU/memory allocation based on history usage; HPA then adjusts the number of replicas based on either CPU utilization or a custom metric. In the future, we can consider more complex prioritization and conflict resolution algorithms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwielgus this mostly looks good to me.
I have one open question that I think is worth answering before merging. Here I'm asking for clarification on what is format of the applicationSLA
field (not used by default recommender, additional input for reward in ML recommender).
This is an alpha API so as I understand it won't be too much pain to change it if we don't like the format but I think it'd be nice to have it described.
Other than that looks good to me
maxAllowed: | ||
memory: max-allowed-memory | ||
# Define the recommender to use here | ||
recommenders: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now a list of recommenders, not a single recommender to use as the comment suggests. How does it work to have multiple implementations in use at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's actually consistent with the definition in the vertical autoscaler: https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L105
Only the first recommender will be supported initially, but later, when multiples are running, only one recommender’s recommendations are selected, and others run in recommendation-only mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbetkier, this is consistent with the VPA support on alternative recommenders. The first support of single alternative recommender is already added, but the multiple alternative recommenders' support for A/B testing is still in planning. We may introduce that soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangchen615 @James-QiuHaoran than you for answering my question. I left one more comment.
I don't understand how the API described in the proposal allows configuring SLO target. To me it looks like we're missing a field. Please add an appropriate field or explain what I'm missing
Note that application SLO field is **optional** and SLO is defined to be the quality of service target that an application must meet (regarding latency, throughput, and so on). | ||
For example, if the latency SLO is in use, then it could be 99% of the requests finish within 100ms. Accordingly, the replica set can be horizontally scaled when the measured latency is greater than 100ms, i.e., violating the SLO value. | ||
The default MPA recommender implemented in this AEP will not use the `applicationSLO` field and this field will be used by users who want to implement their own recommender. For example, an RL/ML-based recommender can have `applicationSLO` as part of the reward/loss function and thus they can have extra constraints in addition to min/max replicas. | ||
The `applicationSLO` field is a floating point number (most application metrics like latency and throughput are floating point numbers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that just the target number is not enough.
Let's use the example from the doc: we're targeting 99% of requests handled in 100 ms or faster.
It seems that we need to provide at least 2 pieces of information:
- Metric reporting 99th percentile of request handling time and
- 100ms (in the same units as used by the metric)
The proposed API has only 1 field of type double, so I can provide 2) (target SLO value) but I don't see a way to specify 1) (metric we're targeting).
For beta I think we ill need ability to specify something more complicated than just a metric value (for example percentile of histogram metric, support for incremental metrics). But for alpha I think we need at least ability to specify what metric we're targetting.
Please amend the proposal to allow that or explain how it's possible with the current proposdal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jbartosik , Thank you for the suggestion! We added an additional field to specify the percentage. It's a floating point number between 0 and 1. For example, if the latency SLO is in use, then it could be 99% of the requests finish within 100ms (i.e., applicationSLO = 100
and percentageSLO = 0.99
). Accordingly, the replica set can be horizontally scaled when the measured latency is greater than 100ms, i.e., violating the SLO value. Similarly, throughput SLOs can be defined as throughput greater than 100/s 90% of the time, i.e., applicationSLO = 100
and percentageSLO = 0.9
.
For alpha, the percentage is always the 99th percentile and only the SLO value will be used for horizontal autoscaling.
For beta, we will use both fields and to add the ability to specify more complicated types (e.g., incremental metrics).
Our previous design encodes both the percentage and value into a single field, i.e., the SLO field in the API object is simply used to represent the 99th percentile latency. Then the recommender compared the measured 99th percentile latency with the SLO value. This old design is deprecated because of the non-flexibility of adjusting the percentage and SLO value separately.
Please review the amended proposal and let us know if it clarifies your question. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the proposal doesn't work. Now there are 2 fields:
applicationSLO
andpercentageSLO
This allows us to specify that we want value of 100 or less 99% of the time.
It still doesn't allow us to specify what metric we're targeting. (Which metric should have value 100 or less 99% of the time?)
Maybe it's better to do like @pbetkier suggested and put those parameters in recommender-specific place instead of putting them in the general API?
In addition to what Piotr pointed out (OS recommender is not going to use those fields so it's a bit weird to have them) it looks like this part of the API is not fully ready yet.
Note that application SLO field is **optional** and SLO is defined to be the quality of service target that an application must meet (regarding latency, throughput, and so on). | ||
For example, if the latency SLO is in use, then it could be 99% of the requests finish within 100ms. Accordingly, the replica set can be horizontally scaled when the measured latency is greater than 100ms, i.e., violating the SLO value. | ||
The default MPA recommender implemented in this AEP will not use the `applicationSLO` field and this field will be used by users who want to implement their own recommender. For example, an RL/ML-based recommender can have `applicationSLO` as part of the reward/loss function and thus they can have extra constraints in addition to min/max replicas. | ||
The `applicationSLO` field is a floating point number (most application metrics like latency and throughput are floating point numbers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the proposal doesn't work. Now there are 2 fields:
applicationSLO
andpercentageSLO
This allows us to specify that we want value of 100 or less 99% of the time.
It still doesn't allow us to specify what metric we're targeting. (Which metric should have value 100 or less 99% of the time?)
Maybe it's better to do like @pbetkier suggested and put those parameters in recommender-specific place instead of putting them in the general API?
In addition to what Piotr pointed out (OS recommender is not going to use those fields so it's a bit weird to have them) it looks like this part of the API is not fully ready yet.
Hi @jbartosik , It makes sense to me to move it under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
custom options for
recommenders
: we cannot define optional fields in the open source API for use solely in a custom, non open sourced recommender. With 1 custom recommender it's going to be already confusing for open source users, with potential 5 custom recommenders it's going to become a mess of optional fields where it's not clear which belong to which recommender. I strongly believe the correct way is to either a) define anoptions
orconfig
field of typemap[string]string
besidesname
inrecommenders
items (the recommender is responsible to parse and validate) or b) have custom CRDs for custom recommenders that require additional config – this way the API can be tailored to every custom recommender's need. Wasn't a) already thought of for VPA recommenders? -
This MPA API vs. Google MPA API: Can't we stick to API structure that's already in Google MPA? I believe the changes made in this AEP are rather cosmetic, so perhaps it's going to be clearer for users if we not diverge?
-
MPA as recommender only: Can we explore the alternative
MPA as a Recommender Only
more? E.g. would it be enough for HPA to read the replica count from a custom metric exposed by MPA recommender? This can be configured right now without any changes to HPA code. And VPA already supports custom recommenders. Perhaps we can think how to solve thehard to coordinate/synchronize when HPA and VPA states kept in different places
and get away without producing a large amount of copied / imported code. The idea of a full MPA with code copied / imported from VPA and HPA is fair enough forv1alpha1
(no guarantees so it can stop working any time), but we would have to solve the maintenance problem before graduating to beta IMO.
Let's hear what @mwielgus has to say as the approver.
@pbetkier, thanks for the comments and suggestions!
Having MPA as a recommender only indeed reduces the effort of maintenance, but I think it requires more assumptions from the HPA side, e.g., (1) the replica count should be able to be actuated in real-time (2) the original logic of HPA should be paused. Otherwise, it doesn't resolve the conflicts between HPA and VPA if the MPA falls back to simply HPA+VPA. The benefit of the proposed framework is to replace HPA and VPA's independent loops with a coordinated loop of generating horizontal and vertical scaling actions. It would be a better way if both the conflict and the maintenance could be resolved. What do you think? |
@James-QiuHaoran Alright then IMO let's align the API structure to Google MPA structure and we're good to go. Note that the MPA will be considered an alpha feature, that may break any time, as w discussed in #5342 (comment):
I mean: if we make changes in VPA or HPA code then we may or may not notice in MPA. I suppose kubernetes/autoscaler reviewers/approvers won't be proactively checking every now and then :) I guess that's fine for an alpha feature. If there's a demand for this feature then we need to address this before graduating to beta. If it turns out ~nobody uses the MPA then we can decide to abort at some point. That's how I see it, I think it's time for approvers @gjtempleton or @mwielgus to approve or request changes. |
@James-QiuHaoran @wangchen615 please align the API structure to Google API as we discussed before and we're good to go. I've talked with @mwielgus and doesn't have more comments. |
@pbetkier The API structure has already been aligned with Google API. We followed the same naming of each field in the API as well. Thank you for following up! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
/lgtm
Asking @mwielgus to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: James-QiuHaoran, mwielgus, pbetkier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Which component this PR applies to?
This PR adds a new component which is a multi-dimensional pod autoscaler.
What type of PR is this?
/kind feature
What this PR does / why we need it:
Currently, Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) control the scaling actions separately as independent controllers to determine the resource allocation for a containerized application.
Due to the independence of these two controllers, when they are configured to optimize the same target, e.g., CPU usage, they can lead to an awkward situation where HPA tries to spin more pods based on the higher-than-threshold CPU usage while VPA tries to squeeze the size of each pod based on the lower CPU usage (after scaling out by HPA).
The final outcome would be a large number of small pods created for the workloads.
Manual fine-tuning the timing to do vertical/horizontal scaling and prioritization are usually needed for syncronization of the HPA and VPA.
We propose a Multi-dimensional Pod Autoscaling (MPA) framework that combines the actions of vertical and horizontal autoscaling in a single action but separates the actuation completely from the controlling algorithms.
It consists of three controllers (i.e., a recommender, an updater, and an admission controller) and an MPA API (i.e., a CRD object or CR) that connects the autoscaling recommendations to actuation.
The multidimensional scaling algorithm is implemented in the recommender.
The scaling decisions derived from the recommender are stored in the MPA object.
The updater and the admission controller retrieve those decisions from the MPA object and actuate those vertical and horizontal actions.
Our proposed MPA (with the separation of recommendations from actuation) allows developers to replace the default recommender with their alternative customized recommender, so developers can provide their own recommender implementing advanced algorithms that control both scaling actions across different resource dimensions.
Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
N/A
Does this PR introduce a user-facing change?
NONE
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: