[e2e failure] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet Should scale ... #54574

spiffxp · 2017-10-25T15:24:34Z

/priority critical-urgent
/sig autoscaling

This test case started failing recently and affects a number of jobs: triage report

This is affecting multiple jobs on the release-master-blocking testgrid dashboard, and prevents us from cutting 1.9.0-alpha.2 (kubernetes/sig-release#22). Is there work ongoing to bring this job back to green?

triage cluster b75045e2cb613e12dca1

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:39
timeout waiting 15m0s for 5 replicas
Expected error:
    <*errors.errorString | 0xc4202cafe0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:128

Suspect range from gci-gce-serial: 060b4b8...51244eb

Suspect range from gci-gke-serial: b1e2d7a...82a52a9

The text was updated successfully, but these errors were encountered:

spiffxp · 2017-10-25T15:25:02Z

@kubernetes/sig-autoscaling-test-failures

spiffxp · 2017-10-25T15:25:54Z

/priority failing-test

MaciekPytel · 2017-10-25T15:37:27Z

@DirectXMan12 the moment this test started failing coincides exactly with merging of #53743. Which BTW is a very large commit to HPA which was not tagged with sig-autoscaling and therefore slipped us completely.

I think we should revert #53743 for now and merge it again after fixing it.

cc: @mwielgus

DirectXMan12 · 2017-10-25T19:09:17Z

I apologize for missing the SIG autoscaling label (although I'm surprised that the bot didn't complain about it. Perhaps because I'm the one who submitted it?).

I'll track down why it's failing.

DirectXMan12 · 2017-10-25T20:01:28Z

found the issue. When you write your scaleTargetRef, it's important to actually specify an APIVersion field now. It didn't matter before, but it was still poor form to just refer to kind: ReplicaSet without apiVersion: extensions/v1beta1 (or apps/v1beta2).

DirectXMan12 · 2017-10-25T20:01:38Z

will have a PR in a couple minutes

DirectXMan12 · 2017-10-25T21:46:12Z

~~... aaand the apps API group doesn't set registry subresource versions correctly, so the group-version on scales returned by apps is apps/v1beta2 (incorrectly).~~

EDIT: @liggitt correctly pointed out that I misread things, and that apps/v1beta2.Scale is a real thing, unfortunately. I'll have to add a slightly different fix to the PR.

DirectXMan12 · 2017-10-25T22:02:23Z

PR posted ^

spiffxp · 2017-10-31T16:04:00Z

PR's continue to await review

…nd-hpa-gvks Automatic merge from submit-queue (batch tested with PRs 53645, 54734, 54586, 55015, 54688). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fix Incorrect Scale Subresources and HPA e2e ScaleTargetRefs The HPA e2es failed to actually set `apiVersion` on the created HPAs, which previous was ignored. Since the polymorphic scale client was merged, this behavior is no longer tolerated (it was never correct to begin with, but it accidentally worked). Additionally, the `apps` resources have their own version of scale. Until `apps/v1beta1` and `apps/v1beta2` go away, we need to support those versions in the scale client. Together, these broke some of the HPA e2es. Fixes #54574 ```release-note NONE ```

spiffxp · 2017-11-09T03:04:14Z

/reopen
I'm still seeing this here https://k8s-testgrid.appspot.com/sig-release-master-blocking#gci-gce-serial

Unless we decide to punt that job from release-master-blocking, this is now impacting 1.9.0-alpha.3 (kubernetes/sig-release#27)

spiffxp · 2017-11-09T03:04:36Z

/kind bug

spiffxp · 2017-11-09T03:05:09Z

/status approved-for-milestone
(new comment since the bot doesn't accept edits) I can type, I swear

abgworrall · 2017-11-10T02:08:49Z

I'm also seeing this on our OS image validation testgrid: https://k8s-testgrid.appspot.com/sig-node-cos-image#e2e-gce-cosbeta-k8sdev-serial

DirectXMan12 · 2017-11-10T21:06:21Z

Looking into the current set of failures

DirectXMan12 · 2017-11-10T22:21:40Z

Looking at the failure logs, I'm seeing

horizontal.go:189] failed to query scale subresource for Deployment/e2e-tests-horizontal-pod-autoscaling-zxc9r/test-deployment: the server could not find the requested resource

So, I tried reproducing locally (provider=local, hack/local-up-cluster.sh), and I cannot. HPA seems to be able to fetch scale properly for replicasets and deployments in a vanilla fresh cluster-up environment. Is there something special about the way we stand up those test environments?

frobware · 2017-11-13T15:29:06Z

Going to investigate this too.

frobware · 2017-11-13T16:58:30Z

I'm still seeing this here https://k8s-testgrid.appspot.com/sig-release-master-blocking#gci-gce-serial

@spiffxp this appears to be passing now. As highlighted by @MaciekPytel on slack/sig-autoscaling, #55413 might be significant here.

spiffxp · 2017-11-13T20:44:45Z

/remove-priority critical-urgent
/priority important-soon
Agree, this has moved off of release-master-blocking, with the exception of soak-gci-gce which I would like to kick out of the list of blocking tests wholesale anyway.

This is still affecting some upgrade tests, which I'm not actively watching yet. Once we hit code freeze, I will be watching them, and will bump priority accordingly. Does something need to be cherry-picked into the release-1.8 branch?

dims · 2017-11-16T00:46:48Z

/assign @frobware

@frobware feel free to reassign/unassign, i assigned it based on your comment 2 days ago

k8s-ci-robot · 2017-11-16T00:46:49Z

@dims: GitHub didn't allow me to assign the following users: frobware.

Note that only kubernetes members can be assigned.

In response to this:

/assign @frobware

@frobware feel free to reassign/unassign, i assigned it based on your comment 2 days ago

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

spiffxp · 2017-11-20T18:12:20Z

/close
I'm no longer seeing this on sig-release-master-blocking nor sig-release-master-upgrade

spiffxp · 2017-11-27T15:00:24Z

/reopen
FYI @kubernetes/sig-autoscaling-test-failures now that some of the upgrade jobs have been fixed, I'm seeing this again in a number of jobs:

eg: triage cluster b75045e2cb613e12dca1

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:41
timeout waiting 15m0s for 3 replicas
Expected error:
    <*errors.errorString | 0xc4202d9330>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:122

triage report

@DirectXMan12 @frobware (taking a total guess) are there fixes that need to be cherry-picked into release-1.8?

tracking this against v1.9.0-beta.1 (kubernetes/sig-release#34)

spiffxp · 2017-11-27T15:01:40Z

/remove-priority important-soon
/priority critical-urgent

DirectXMan12 · 2017-11-29T18:50:24Z

@spiffxp I'd guess that we'd have to cherry-pick the test suite fixes back to the 1.8 test suite if you've got instances of the 1.8 test suite running against 1.9 code.

DirectXMan12 · 2017-11-29T18:51:26Z

The fix needed should be #54586. Let me try and repro locally (1.9 cluster, 1.8 tests) and see what happens.

DirectXMan12 · 2017-11-30T01:14:49Z

I've reproduced locally. The backport seems to fix the issue (just doing one final test run). Should have a PR up shortly.

spiffxp · 2017-12-01T18:33:42Z

Now tracking against v1.9.0-beta.2 (kubernetes/sig-release#39)

…est-scale-gvks Automatic merge from submit-queue. [e2e] make sure to specify APIVersion in HPA tests Previously, the HPA controller ignored APIVersion when resolving the scale subresource for a kind, meaning if it was set incorrectly in the HPA's scaleTargetRef, it would not matter. This was the case for several of the HPA e2e tests. Since the polymorphic scale client merged into Kubernetes 1.9, and we need to do upgrade testing, APIVersion now matters. This updates the HPA e2es to care about APIVersion, by passing kind as a full GroupVersionKind, and not just a string. Fixes #54574 (again) ```release-note NONE ```

k8s-github-robot · 2017-12-03T08:22:43Z

[MILESTONENOTIFIER] Milestone Issue Needs Attention

@DirectXMan12 @spiffxp @kubernetes/sig-autoscaling-misc

Action required: During code freeze, issues in the milestone should be in progress.
If this issue is not being actively worked on, please remove it from the milestone.
If it is being worked on, please add the status/in-progress label so it can be tracked with other in-flight issues.

Action Required: This issue has not been updated since Dec 1. Please provide an update.

Note: This issue is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required

Issue Labels

sig/autoscaling: Issue will be escalated to these SIGs if needed.
priority/critical-urgent: Never automatically move issue out of a release milestone; continually escalate to contributor and SIG through all available channels.
kind/bug: Fixes a bug discovered during the current release.

Help

spiffxp · 2017-12-04T02:49:59Z

/close
Thanks for the backport, I now see green for this test

k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. labels Oct 25, 2017

k8s-ci-robot added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Oct 25, 2017

spiffxp mentioned this issue Oct 25, 2017

v1.9.0-alpha.2 kubernetes/sig-release#22

Closed

DirectXMan12 mentioned this issue Oct 25, 2017

Fix Incorrect Scale Subresources and HPA e2e ScaleTargetRefs #54586

Merged

MaciekPytel mentioned this issue Oct 26, 2017

HPA scale-up/scale-down e2e tests failing on large clusters #54637

Closed

k8s-github-robot closed this as completed in #54586 Nov 6, 2017

spiffxp added this to the v1.9 milestone Nov 7, 2017

k8s-ci-robot reopened this Nov 9, 2017

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 9, 2017

k8s-ci-robot added the status/approved-for-milestone label Nov 9, 2017

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Nov 13, 2017

k8s-ci-robot closed this as completed Nov 20, 2017

k8s-ci-robot reopened this Nov 27, 2017

k8s-github-robot added the milestone/removed label Nov 27, 2017

k8s-github-robot removed this from the v1.9 milestone Nov 27, 2017

k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 27, 2017

spiffxp added this to the v1.9 milestone Nov 27, 2017

k8s-github-robot added milestone/needs-attention and removed milestone/removed labels Nov 27, 2017

janetkuo assigned DirectXMan12 Nov 29, 2017

DirectXMan12 mentioned this issue Nov 30, 2017

[e2e] make sure to specify APIVersion in HPA tests #56605

Merged

k8s-ci-robot closed this as completed Dec 4, 2017

lukaszgryglicki mentioned this issue Mar 1, 2018

Pervasive lag issue with label/milestone changes in issues and PRs cncf/devstats.archive#78

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[e2e failure] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet Should scale ... #54574

[e2e failure] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet Should scale ... #54574

spiffxp commented Oct 25, 2017 •

edited

Loading

spiffxp commented Oct 25, 2017

spiffxp commented Oct 25, 2017

MaciekPytel commented Oct 25, 2017

DirectXMan12 commented Oct 25, 2017 •

edited

Loading

DirectXMan12 commented Oct 25, 2017 •

edited

Loading

DirectXMan12 commented Oct 25, 2017

DirectXMan12 commented Oct 25, 2017 •

edited

Loading

DirectXMan12 commented Oct 25, 2017

spiffxp commented Oct 31, 2017

spiffxp commented Nov 9, 2017

spiffxp commented Nov 9, 2017 •

edited

Loading

spiffxp commented Nov 9, 2017

abgworrall commented Nov 10, 2017

DirectXMan12 commented Nov 10, 2017

DirectXMan12 commented Nov 10, 2017

frobware commented Nov 13, 2017

frobware commented Nov 13, 2017

spiffxp commented Nov 13, 2017

dims commented Nov 16, 2017

k8s-ci-robot commented Nov 16, 2017

spiffxp commented Nov 20, 2017 •

edited

Loading

spiffxp commented Nov 27, 2017 •

edited

Loading

spiffxp commented Nov 27, 2017

DirectXMan12 commented Nov 29, 2017

DirectXMan12 commented Nov 29, 2017 •

edited

Loading

DirectXMan12 commented Nov 30, 2017

spiffxp commented Dec 1, 2017

k8s-github-robot commented Dec 3, 2017

spiffxp commented Dec 4, 2017

[e2e failure] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet Should scale ... #54574

[e2e failure] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet Should scale ... #54574

Comments

spiffxp commented Oct 25, 2017 • edited Loading

spiffxp commented Oct 25, 2017

spiffxp commented Oct 25, 2017

MaciekPytel commented Oct 25, 2017

DirectXMan12 commented Oct 25, 2017 • edited Loading

DirectXMan12 commented Oct 25, 2017 • edited Loading

DirectXMan12 commented Oct 25, 2017

DirectXMan12 commented Oct 25, 2017 • edited Loading

DirectXMan12 commented Oct 25, 2017

spiffxp commented Oct 31, 2017

spiffxp commented Nov 9, 2017

spiffxp commented Nov 9, 2017 • edited Loading

spiffxp commented Nov 9, 2017

abgworrall commented Nov 10, 2017

DirectXMan12 commented Nov 10, 2017

DirectXMan12 commented Nov 10, 2017

frobware commented Nov 13, 2017

frobware commented Nov 13, 2017

spiffxp commented Nov 13, 2017

dims commented Nov 16, 2017

k8s-ci-robot commented Nov 16, 2017

spiffxp commented Nov 20, 2017 • edited Loading

spiffxp commented Nov 27, 2017 • edited Loading

spiffxp commented Nov 27, 2017

DirectXMan12 commented Nov 29, 2017

DirectXMan12 commented Nov 29, 2017 • edited Loading

DirectXMan12 commented Nov 30, 2017

spiffxp commented Dec 1, 2017

k8s-github-robot commented Dec 3, 2017

spiffxp commented Dec 4, 2017

spiffxp commented Oct 25, 2017 •

edited

Loading

DirectXMan12 commented Oct 25, 2017 •

edited

Loading

DirectXMan12 commented Oct 25, 2017 •

edited

Loading

DirectXMan12 commented Oct 25, 2017 •

edited

Loading

spiffxp commented Nov 9, 2017 •

edited

Loading

spiffxp commented Nov 20, 2017 •

edited

Loading

spiffxp commented Nov 27, 2017 •

edited

Loading

DirectXMan12 commented Nov 29, 2017 •

edited

Loading