-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull-kubernetes-kubemark-e2e-gce-scale fails when it should not and gathers too little info #24349
Comments
@kubernetes/sig-scalability |
/sig scalability |
Analyzing the linked job run, it starts at:
however, during the test it suddenly looses the connection against the environment:
the rest of the logs are just timeouts, |
The project is shared across couple optional presubmits. It's possible that someone triggerred one of those presubmits at that time, which brings down the previous test... |
I found logs that confirm master was deleted around time when connection timed out errors started:
I will migrate this jobs today to new infrastructure and we will have boskos pool so it shouldn't happen again. /assign @marseel |
Thanks @marseel ! |
Was everybody collectively responsible for having only one pull-kubernetes-kubemark-e2e-gce-scale job running at a time? If so then it was a well-kept secret. Are there any other secrets like that? |
What happened:
I submitted a PR (kubernetes/kubernetes#106325) that changes only a unit test and asked for the pull-kubernetes-kubemark-e2e-gce-scale job to be run on that PR. This job timed out after 20 hours and gathered much less information than expected. #24303 helped, but there is still stuff missing (such as apiserver logs). For an example, look at https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/106325/pull-kubernetes-kubemark-e2e-gce-scale/1459459749048225792 . Near the end of build-log.txt we see that both run-e2e.sh and log-dump.sh failed.
Before that there are a lot of complaints about stuff not found.
For another example, see https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/106085/pull-kubernetes-kubemark-e2e-gce-scale/1459763740323876864 (for kubernetes/kubernetes#106085). This one failed after a bit less than 13 hours, and also gathered less than usual.
What you expected to happen:
I expected the test to pass on harmless PRs and always to get a lot more clues about what happened.
How to reproduce it (as minimally and precisely as possible):
Look at the results of any recent run of that job. Try it on any PR.
Please provide links to example occurrences, if any:
See above for an easy one.
Anything else we need to know?:
The text was updated successfully, but these errors were encountered: