-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve autoscaler's observed average calculation. #1194
Improve autoscaler's observed average calculation. #1194
Conversation
The autoscaler takes all probes over a certain window, either panic or stable respectively, to calculate the observed concurrency in the system. If you simply take all probes and calculate the average across all of them, newly joined pods get an unfairly low share in the average calculation. That means that even though the system has successfully scaled up to more pods, the "old" and overloaded pods' measurements are given an unnaturally high weight vs. the newly joined pods. This aims to solve this issue by not aggregating over all probes but aggregating an average per pod first. That way, all oberves pods in the system get an equal share in the observed concurrency calculation and the autoscaler scales more precisely.
/assign @josephburnett |
The following is the coverage report on pkg/. Say
*TestCoverage feature is being tested, do not rely on any info here yet |
pkg/autoscaler/autoscaler.go
Outdated
return accumulatedConcurrency / float64(agg.observedPods()) | ||
} | ||
|
||
// hols an aggregation per pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
holds
would it make more precise if average is a moving average instead of simple average? that'll handle bursty traffic too. |
@rootfs Thought about a similar thing. Essentially I'd think weighing older metrics lower than the newer ones would make sense. Kinda wanted to implement this step first though, we can always improve on that. Want more changes in the current PR? Edit: Thinking about it again: Aren't we already effectively using a moving average, since we only look at metrics of a defined window (which is a time window vs. a probe window but still)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
pkg/autoscaler/autoscaler.go
Outdated
func (agg *totalAggregation) aggregate(stat Stat) { | ||
if agg.perPodAggregations == nil { | ||
agg.perPodAggregations = make(map[string]*perPodAggregation) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: a more idiomatic way of doing this initialization is to create a method newTotalAggregation
that returns a totalAggregation
with a non-nil map. Then aggregate
can just focus on incremental updates, not initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: josephburnett, markusthoemmes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The following is the coverage report on pkg/. Say
*TestCoverage feature is being tested, do not rely on any info here yet |
Proposed Changes
The autoscaler takes all probes over a certain window, either panic or stable respectively, to calculate the observed concurrency in the system. If you simply take all probes and calculate the average across all of them, newly joined pods get an unfairly low share in the average calculation. That means that even though the system has successfully scaled up to more pods, the "old" and overloaded pods' measurements are given an unnaturally high weight vs. the newly joined pods.
This aims to solve this issue by not aggregating over all probes but aggregating an average per pod first. That way, all oberves pods in the system get an equal share in the observed concurrency calculation and the autoscaler scales more precisely.
Release Note