-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci-kubernetes-build-canary jobs are failing to be scheduled #20670
Comments
Slack thread in #testing-ops: https://kubernetes.slack.com/archives/C7J9RP96G/p1611953363009500 |
Latest succeeded run is configured at 9f603e50-624b-11eb-bd55-a20b0ecfb997 First failed run is configured at 2395e2c8-6254-11eb-bd55-a20b0ecfb997 One noticeable difference in the prowjob is that in the failed run:
which is not present in the succeeded run. And test container both requested:
It's possible that each node in |
To make it more clear, the difference in the prowjobs resulted requested cpu from 7.3 to 7.7, and probably not related to memory |
Thanks @chaodaiG that is most likely the problem. A job trying to schedule all available CPU as a proxy for hogging a node all to itself is a risky strategy for this reason. The k8s-infra-prow-build nodes have (among other daemonsets) calico running on them for network policy enforcement, meaning they have slightly more overhead / less allocatable cpu than the google.com default nodes. /area jobs |
I/etcd just need those sweet sweet IOPs 🦆 , someday we'll have scheduling for this ... |
Thanks everyone for taking a peek. |
If we can make progress on moving k8s-infra-prow-build up to 1.18 we can try out nodepools with local-ssd, waaaaaay more iops kubernetes/k8s.io#1187 (comment) |
What happened:
The last six runs of
ci-kubernetes-build-canary
have had scheduling issues.(Some of these are triggered reruns.)
What you expected to happen:
Successful scheduling on a Prow node 🙃
How to reproduce it (as minimally and precisely as possible):
Reproduces on any current run of the job.
Please provide links to example occurrences, if any:
Job: https://prow.k8s.io/?job=ci-kubernetes-build-canary
Failed runs:
Anything else we need to know?:
The job was recently migrated from using
bootstrap
tokrel ci-build
in #20663.ref: kubernetes/release#1711 (comment)
cc: @kubernetes/release-engineering @spiffxp
The text was updated successfully, but these errors were encountered: