-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable a large chunk of upstream e2e tests that were accidentally not being run #18816
Conversation
1b3322e
to
e8aaec8
Compare
/test gcp |
@openshift/sig-networking I opened an Ansible PR to default GCP clusters to network policy. Can you look at the other issues here and determine which ones are functional issues for 3.9? Ie source ip preservation, ipv6, any of the load balancer ones. |
Hm. So our tests enable all
I'm not sure yet what's up with |
oh, the TCP CLOSE_WAIT thing is already fixed, I just typoed when I searched for it before. kubernetes/kubernetes#56765 |
This is extracted from openshift#18816 in order to make hostpath tests pass.
This is extracted from openshift#18816 in order to make hostpath tests pass.
Thanks, that was the info I needed. /retest |
This is extracted from openshift#18816 in order to make hostpath tests pass.
/retest |
I tested the network policy plugin with this pr, and https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18816/test_pull_request_origin_extended_conformance_gce/17466/#sig-network-networkpolicy-networkpolicy-between-server-and-client-should-allow-ingress-access-on-one-named-port-featurenetworkpolicy-suiteopenshiftconformanceparallel-suitek8s failed ([sig-network] NetworkPolicy NetworkPolicy between server and client should allow ingress access on one named port [Feature:NetworkPolicy] [Suite:openshift/conformance/parallel] [Suite:k8s]) |
e307766
to
e3e96cb
Compare
@smarterclayton okay, I am pretty certain that this particular test was passing for us before. |
Some of them were failing because of wrong oc binary getting picked up. The image layer test is still failing though. I have kicked off another full run. |
Down to 6 failures this run:
|
Of these the template service broker tests are failing similarly and one of the dc tests is a flake we have seen before.
|
Running locally I found a test namespace that isn't being cleaned up after a run:
and also
I think that means that those tests were in the child that failed (the process timed out and was killed, which prevented them from being cleaned up). The first one is k8s.io/kubernetes/test/e2e/kubectl/kubectl.go:537 which is
|
seems to hang on cri-o |
it eventually completed, but probably hangs / doesn't work in very long runs
|
@smarterclayton we will take a look at liveness exec. The other inline attach test depends on a docker specific behavior where docker just disconnects from attach even though it is asked to keep stdin alive. Can we temporarily disable that? I will look into modifying that test upstream. |
/retest |
Yes we can temporarily disable it - do you have a bugzilla open already? |
@mrunalp a problem with disabling that is that it is a conformance test. you'll need to put extra attention to get it fixed. It also means cri-o would not be considered conformant. Please give that extra focus. |
test/extended/util/test.go
Outdated
@@ -332,6 +332,8 @@ var ( | |||
`SELinux relabeling`, // https://github.com/openshift/origin/issues/7287 still broken | |||
`Volumes CephFS`, // permission denied, selinux? | |||
|
|||
`Probing container should \*not\* be restarted with a exec "cat /tmp/health" liveness probe`, // https://bugzilla.redhat.com/show_bug.cgi?id=1624041 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be other one - "should support inline execution and attach"
/test gcp-crio |
Ok, looks like that got us past the hanging test. Two new failures this time it looks like, maybe it's taking too long for crio /test gcp-crio |
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Seeing these fail on crio - most of the are consistent. If this is timeouts, we should bump them.
|
The feature gate is not yet enabled and may not be for several releases. Pod team owns allowing this to be used.
The feature gate is off and will remain off for several releases. Pod team owns fixing this carry.
Remove use of -suite parameter in favor of a temporary SUITE env var for old jobs. Newer jobs will call gingko directly. Remove the setup code from the extended tests.
The inline attach scenario is behaving differently between docker and crio. Temporarily disable while this is fixed upstream.
I'm going to set crio to the minimal suite on ansible for now (.../minimal) so we should be able to merge this and let this get fixed incrementally |
Merging, once this is in I'll switch over the release jobs in openshift/release#1339 |
While investigating why GCE PVs were failing, we realized that there were no e2e tests running for PVs, which led us to realize that we weren't including a large chunk of the new sig-specific tests added as subfolders of
k8s/test/e2e
into our test suite. This was due to how the e2e test upstream has changed - it used to be a regular package that grabbed other tests, but then changed to be a_test
, which caused our imports to silently stop grabbing those tests. This PR specifically links those in (although in the future we need a better reflective test to validate we aren't dropping those) here and corrects the gaps.The biggest gaps are alpha features, although quite a few upstream tests make bad assumptions about master nodes. We also went through and investigated the networking issues - GCP CI runs were switched to the openshift network policy plugin in order to get better coverage, and there was only one known failure (ingress port by name).
The
_install
job will likely need an exclusion rule for some of the tests until we can get the AWS cluster job up and running.