-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix e2e conformance test predicates conflict hostport #96627
Conversation
@aojea: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @BenTheElder @spiffxp @johnbelamaric |
2b540f7
to
e4d1618
Compare
/retest |
/retest |
The e2e test, included as part of Conformance, "validates that there is no conflict between pods with same hostPort but different hostIP and protocol" was only testing that the pods were scheduled without conflict but was never testing the functionality. The test should check that pods with containers forwarding the same hostPort can be scheduled without conflict, and that those exposed HostPort are forwarding the ports to the corresponding pods. the predicate tests were using loopback addresses for the the hostPort test, however, those have different semantics depending on the IP family, i.e. you can not bind to ::1 and ::2 simultanously, in addition, IP forwarding from localhost to localhost in IPv6 is not working since it doesn't have the kernel route_localnet hack.
for context, the actual test is wrong because it succeeds despite the pods are not doing portmapping, we hit that in Openshift CI |
/area conformance I'm allowing this in because we're not at test freeze, but if we see an increase in flakiness in this test over the weekend, or in PR's next week, we should revert this rather than try to fix-forward. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, spiffxp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I think this is causing a failure in case of Windows as hostNetwork is not supported on Windows hosts. Ref: https://testgrid.k8s.io/sig-windows-releases#aks-engine-azure-windows-master-staging-serial-slow @aojea - I think the job of scheduler is done when the pod has properly assigned a node, actual testing of networking to me should have been a networking test in This actually brings another interesting point. This is another scenario where running Windows tests earlier (atleast a subset) would have caught the problem, can we move to having the Windows tests as optional tests as a starting point, considering we have been deflaking the CI in the past few weeks. |
This is an e2e and conformance test, I personally think that it should verify that ALL works, not only the scheduler job. If the scheduler is able to schedule the pods, but once they are scheduled, they are not working or not exposing the ports, it is not useful for any user :) I'm not familiar with windows at all but that testgrid link that you pasted only shows jobs running 56 tests, and not being able to run hostnetwork pods makes me think that it will fail in a lot of e2e and conformance test. I personally think that we should not modify a conformance test to have a different behaviour per platform, and I would suggest to create new scheduler test with the previous behavior if this is really important for windows, but this is my opinion, I have to defer to the sig-testing and conformance people Ben, @spiffxp , @dims, @johnbelamaric ,. |
Right, I am not saying we shouldn't test if the exposed ports are working or not. I am just saying that it's not part of scheduler job and perhaps this test should live elsewhere.
Yeah, the dashboard link I shared is for serial tests
I agree but hostNetwork is not clearly supported on Windows, I think the question is what should we do in these type of scenarios where different platforms behave differently in case of conformance tests. |
Ahh ok, thanks for the clarification, agree with you on the points and the questions that you are raising 😄 |
@adelina-t - She was also looking at some hostport issues raised by tests for Windows nodes. |
Firstly: Windows "conformance" does not exist as of yet. Conformance testing targets linux. The conformance test should not alter behavior based on platform. AFAIK hostNetwork is fair game for conformance tests but that would be up to the conformance project, if it is fair game and it can't work on windows then the linux only tag is appropriate rather than altering the behavior. We are still trying to reduce the reliance on presubmit. AIUI windows tests are already running in post-submit, but people need to monitor the results. |
@BenTheElder we've added a few pre-submit jobs to verify Windows functionality but currently they must be triggered manually. https://github.com/kubernetes/test-infra/blob/750d12f6f39e286c8c590eb0c28d50b92cb33e02/config/jobs/kubernetes-sigs/sig-windows/sig-windows-config.yaml#L22 contains the triggers / job names. |
👍 thanks, I also see some of the periodics have email alerting enabled 🎉 |
Thank you for the clarification @BenTheElder. I was under the impression that conformance tests are for both Linux and Windows.
To be clear, I was thinking of using --node-os-distro=windows and then use I think we can change the ownership of this test to networking in a separate PR |
is the mechanism this used to verify connectivity required/guaranteed to work on all conformant clusters? |
we found that this test was passing for weeks despite it must be failing, because one of the pods failed to star with a conflict, so it restarted, cleaning the other hostports and removing the conflict. The result is that the pods are scheduled but they are not working as expected, you can not reach them in the HostPort expected, well, only one of them. You have to verify that the host ports exposed are really working, so you can be sure that the each hostPort is being forwarded to the corresponding HostIP, |
I have a followup PR to move the ownership to sig-network #98299 |
What type of PR is this?
/kind cleanup
/kind failing-test
What this PR does / why we need it:
The e2e test, included as part of Conformance,
"validates that there is no conflict between
pods with same hostPort but different hostIP and protocol"
was only testing that the pods were scheduled without conflict
but was never testing the functionality.
The test should check that pods with containers forwarding the same
hostPort can be scheduled without conflict, and that those exposed
HostPort are forwarding the ports to the corresponding pods.
Which issue(s) this PR fixes:
Special notes for your reviewer:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: