-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[occm] Make sure we don't mask LB tests failures and fix what was failing #2360
Conversation
Here's how I tested this. |
/test openstack-cloud-controller-manager-e2e-test |
Well, now we finally start to see failures. /retest |
Okay, so this got broken when I started to limit LB sharing, but I never noticed because tests were busted. |
@dulek Is it worth reverting the previous change to land this quickly? I'm guessing we wouldn't have merged it if we knew it broke a test? |
It's not that trivial anymore. Anyway I plan to look more closely at these tests, my current understanding is that adding |
@lingxiankong, do you know why we have this in place in tests? |
f0e9984
to
12c6e3d
Compare
Going forward… |
Probably only worth it if the fix drags on a bit, then. In the meantime we probably shouldn't merge LB-related changes |
/retest I can see in the logs that listener was removed and LB was ACTIVE again. It couldn't be the ACTIVE-PENDING_UPDATE dance, as we only allow Service to be deleted once it's processed and Let's try it once again. |
With shared LBs we distinguish the elements by tagging them with the proper name of the LB that would be created for a Service if it wasn't created as shared. This commit fixes that comparison for listener deletion as code was always comparing the name of the primary LB.
12c6e3d
to
71dd1df
Compare
Okay, there was a bug, fix is added. |
That's another one. |
71dd1df
to
a118d1f
Compare
Okay, this one was simple, I left some internal annotations by mistake. |
PR kubernetes#2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs.
In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
a118d1f
to
22a186a
Compare
/lgtm @dulek given you actually create additional commit for other purpose (e.g compare the LB name) ,can you help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second and third commits lgtm.
I don't understand the first commit yet.
@@ -2390,7 +2390,7 @@ func (lbaas *LbaasV2) deleteLoadBalancer(loadbalancer *loadbalancers.LoadBalance | |||
Protocol: proto, | |||
Port: int(port.Port), | |||
}] | |||
if isPresent && cpoutil.Contains(listener.Tags, loadbalancer.Name) { | |||
if isPresent && cpoutil.Contains(listener.Tags, svcConf.lbName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this is different to loadbalancer.Name? I've read this function and the calling function, in which you're now setting svcConf.lbName to the return value of lbaas.GetLoadBalancerName
. Does this imply that the loadbalancer name (retrieved by id?) is different to GetLoadBalancerName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so this all boils down to the support for shared LBs that I believe is implemented in a way that is very problematic codewise.
How does it work? You add an annotation to the Service that features the LB ID you want that Service to use. The assumption is that the LB exists already and CPO will just add listener, pool and members (members should be the same really1) to that LB. The secondary LB resources will be tagged with the LB name that would be used for the secondary Service so that we can distinguish them later on.
The bug here is that we looked up listeners comparing their tags with the name of the LB taken from OpenStack, which is based on the primary Service name. In that case we should use the secondary service name and this is the fix.
Footnotes
-
Now this makes me think that we might have a bug that 2 Services sharing an LB can fight for their view of LB members. Fixed on reconciliation, but not great. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels very fragile, but fixing that is definitely outside of the scope of this PR.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It totally is and you'll probably be happy to see it disabled by default in OpenShift: openshift/cluster-cloud-controller-manager-operator#263.
Done! |
@jichenjc: Anything else to move this forward? |
no, all good ~ /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jichenjc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-1.27 |
@dulek: #2360 failed to apply on top of branch "release-1.27":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…ling (kubernetes#2360) * Fix shared LBs tests PR kubernetes#2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
…ling (kubernetes#2360) * Fix shared LBs tests PR kubernetes#2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
…ling (#2360) (#2367) * Fix shared LBs tests PR #2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
…ubernetes#2360) * Fix shared LBs tests PR kubernetes#2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
…2360) (#2430) * Fix shared LBs tests PR #2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function. Co-authored-by: Michal Dulko <mdulko@redhat.com>
…ling (kubernetes#2360) * Compare proper LB name for shared LBs With shared LBs we distinguish the elements by tagging them with the proper name of the LB that would be created for a Service if it wasn't created as shared. This commit fixes that comparison for listener deletion as code was always comparing the name of the primary LB. * Fix shared LBs tests PR kubernetes#2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
/cherry-pick release-1.28 For some reason this haven't made it to 1.28 branch, which is pretty bad. Trying to fix that now. |
@dulek: #2360 failed to apply on top of branch "release-1.28":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…ling (kubernetes#2360) * Compare proper LB name for shared LBs With shared LBs we distinguish the elements by tagging them with the proper name of the LB that would be created for a Service if it wasn't created as shared. This commit fixes that comparison for listener deletion as code was always comparing the name of the primary LB. * Fix shared LBs tests PR kubernetes#2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
…ling (kubernetes#2360) * Compare proper LB name for shared LBs With shared LBs we distinguish the elements by tagging them with the proper name of the LB that would be created for a Service if it wasn't created as shared. This commit fixes that comparison for listener deletion as code was always comparing the name of the primary LB. * Fix shared LBs tests PR kubernetes#2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
…ling (kubernetes#2360) * Compare proper LB name for shared LBs With shared LBs we distinguish the elements by tagging them with the proper name of the LB that would be created for a Service if it wasn't created as shared. This commit fixes that comparison for listener deletion as code was always comparing the name of the primary LB. * Fix shared LBs tests PR kubernetes#2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
…ling (#2360) (#2537) * Compare proper LB name for shared LBs With shared LBs we distinguish the elements by tagging them with the proper name of the LB that would be created for a Service if it wasn't created as shared. This commit fixes that comparison for listener deletion as code was always comparing the name of the primary LB. * Fix shared LBs tests PR #2190 prohibited sharing an LB that is internal for security reasons. This commit fixes the shared LBs tests to not create internal LBs. * Make sure we don't mask LB tests failures In `test-lb-service.sh` we do `trap "delete_resources" EXIT` to make sure we cleanup resources on a test failure. In there, we only fetched the `$?` after making a check for `${AUTO_CLEAN_UP}`, which itself alters the code to 0, so function always returns success. This means tests can never really fail. This commit fixes it by making sure `$ERROR_CODE` is fetched at the very beginning of the cleanup function.
What this PR does / why we need it:
In
test-lb-service.sh
we dotrap "delete_resources" EXIT
to make sure we cleanup resources on a test failure. In there, we only fetched the$?
after making a check for${AUTO_CLEAN_UP}
, which itself alters the code to 0, so function always returns success. This means tests can never really fail.This commit fixes it by making sure
$ERROR_CODE
is fetched at the very beginning of the cleanup function.Some additional fixes needed to be made to make tests passing again. In particular:
Which issue this PR fixes(if applicable):
fixes #2540
Special notes for reviewers:
Release note: