-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: wait namespace is actually deleted #413
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good change. One comment, and I also wonder if we need to add the import alias for K8s errors.
I can remove the alias if you want but we have a variable named |
Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
Co-authored-by: Erik Godding Boye <egboye@gmail.com> Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
856b9cc
to
ad99d90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I was tempted to get your take on the use of this projects kubernetes.go::WaitForDelete which is generic... but I actually have a strong preference for passing ctx which it is missing. However when the generic function was created, most of k8s code didn't have ctx... I believe that was added post 1.16 (hard to remember now). It would still be interesting to get your take on the value of modifying the generic function or removing it. The WaitForDelete
doesn't currently have any dependencies and if we don't have interest in it... I will likely delete it.
1 sec... This code does look good! however just prior to merging I realized the potential issue. This is valuable for specific use case(s)... however for a use case of parallel tests or tests isolated by namespaces, you don't need to wait for the deletion of a previous namespace to continue working in another namespace. I need to review 1 more time zoomed out to the larger picture. It is possible this is fine. I would like to reason thru the code from larger perspective. |
the first thing I noticed is: if the test is successful, however the namespace isn't deleted (perhaps a timeout) then the success turns to a failure. Do we want a failure for this? I can actually see both cases... we could see a timeout on wait and the cluster is at fault for which it is a false negative. I could see a timeout on deletion which is caused by namespace finalizers for which you do want a failure to be reported. It would be good IMO to discuss all scenarios and reason thru expectations. It may lead to another test configuration to wait for deletion or to declaratively express if deletion should raise a failure. |
I'm not sure why we wouldn't want to wait. I discovered the issue when running kuttl twice rapidly, first time all worked fine, second time tests failed because a namespace created during the first run was not yet deleted (not waiting actual namespace deletion can have an impact across multiple kuttl runs). In this sense I feel like failing to cleanup should lead to a test failure, cleaning up correctly sounds important to me, especially when running kuttl on non ephemeral clusters. If the namespace fails to cleanup because of finalisers, the test can probably be improved to perform resources deletion in the correct order. |
I didn't know about the
💯 |
actually with the ns deletion moving to the |
IMHO it's not just about namespaces but all resources a kuttl test creates, the deletion order matters too, we should not try to delete everything concurrently, this can lead to serious problems. Take the following case:
If we delete the controller before the CR we're stuck, the CR will never be deleted because the controller will never perform finalisation. If this happens I would expect a test failure. |
I'm not confident in the error message we will receive for this... worth a look... looking... |
I'm fine with this. If it happens it's probably a sign my test scenario has holes. We could/should document that. |
Let me know if you want me to change the message. |
@kensipe |
Co-authored-by: Erik Godding Boye <egboye@gmail.com> Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
Co-authored-by: Erik Godding Boye <egboye@gmail.com> Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com> Signed-off-by: Israel Blancas <iblancasa@gmail.com>
…udobuilder#421) This reverts commit 6b6e6b9. Signed-off-by: Israel Blancas <iblancasa@gmail.com>
Co-authored-by: Erik Godding Boye <egboye@gmail.com> Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
* fix: wait namespace is actually deleted (#413) * Adding Skip Delete for Namespace in Tests Co-authored-by: Ken Sipe <kensipe@gmail.com> Co-authored-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com> Co-authored-by: Erik Godding Boye <egboye@gmail.com> Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com> Signed-off-by: Ken Sipe <kensipe@gmail.com>
What this PR does / why we need it:
This PR waits the test case namespace is actually deleted when deleting a namespace.
This can take some time until finalisers run, returning before the actual deletion can disturb other tests.