-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bucketize autoscaling metrics by timeframe not by pod name. #3289
Bucketize autoscaling metrics by timeframe not by pod name. #3289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markusthoemmes: 0 warnings.
In response to this:
Fixes #2977
Proposed Changes
Stats are averaged in each specific timeframe vs. averaged over the whole window. See the linked issue for more in-depth information
Release Note
TBD
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
0fde3c4
to
a61fb92
Compare
Unrelated failure /test pull-knative-serving-integration-tests |
/assign @yanweiguo Please let me know what you think. |
@@ -281,7 +281,7 @@ func assertAutoscaleUpToNumPods(ctx *testContext, numPods int32) { | |||
defer close(stopChan) | |||
|
|||
go func() { | |||
if err := generateTraffic(ctx, int(numPods*10), 30*time.Second, stopChan); err != nil { | |||
if err := generateTraffic(ctx, int(numPods*10), 60*time.Second, stopChan); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes stabilize the autoscaling tests. They have recently been adjusted to continue generating more traffic as soon as the we hit the desired replica count. However that's only been done on "Replicas" so we're at danger of overflowing if the pod takes a while to come up.
Likewise the amount of traffic being sent in (30s) can be juuuuuust about enough to cause us to scale up. After 60s it's guaranteed to (for the default window sizes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superficial mostly. I need to re-read the PR again for the logic part, though it mostly makes sense to me.
@@ -668,3 +628,7 @@ func createEndpoints(ep *corev1.Endpoints) { | |||
kubeClient.CoreV1().Endpoints(testNamespace).Create(ep) | |||
kubeInformer.Core().V1().Endpoints().Informer().GetIndexer().Add(ep) | |||
} | |||
|
|||
func roundedNow() time.Time { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason to use roundedNow
that it prevent flakiness because some stats could be out of scale window if now()
is used directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it basically normalizes the instances of "now" so the test doesn't depend on when exactly it is executed. Especially when adding to "now" in the tests we otherwise risk to jump into other buckets in the calculation. It makes the test deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
The following is the coverage report on pkg/.
|
/lgtm |
/lgtm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: markusthoemmes, srinivashegde86 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #2977
Fixes #2379
Proposed Changes
Stats are averaged in each specific timeframe vs. averaged over the whole window. See the linked issue for more in-depth information
Release Note