Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: wait namespace is actually deleted #413

Merged
merged 3 commits into from
Nov 14, 2022

Conversation

eddycharly
Copy link
Contributor

What this PR does / why we need it:

This PR waits the test case namespace is actually deleted when deleting a namespace.
This can take some time until finalisers run, returning before the actual deletion can disturb other tests.

@eddycharly eddycharly marked this pull request as ready for review November 10, 2022 08:26
Copy link
Contributor

@erikgb erikgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change. One comment, and I also wonder if we need to add the import alias for K8s errors.

pkg/test/case.go Outdated Show resolved Hide resolved
@eddycharly
Copy link
Contributor Author

eddycharly commented Nov 10, 2022

I also wonder if we need to add the import alias for K8s errors

I can remove the alias if you want but we have a variable named errors.

eddycharly and others added 3 commits November 10, 2022 11:03
Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
Co-authored-by: Erik Godding Boye <egboye@gmail.com>
Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
@eddycharly eddycharly requested review from erikgb and removed request for kensipe, kaiwalyajoshi and asekretenko November 10, 2022 10:38
Copy link
Contributor

@erikgb erikgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for this!

Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I was tempted to get your take on the use of this projects kubernetes.go::WaitForDelete which is generic... but I actually have a strong preference for passing ctx which it is missing. However when the generic function was created, most of k8s code didn't have ctx... I believe that was added post 1.16 (hard to remember now). It would still be interesting to get your take on the value of modifying the generic function or removing it. The WaitForDelete doesn't currently have any dependencies and if we don't have interest in it... I will likely delete it.

@kensipe
Copy link
Member

kensipe commented Nov 11, 2022

1 sec... This code does look good! however just prior to merging I realized the potential issue. This is valuable for specific use case(s)... however for a use case of parallel tests or tests isolated by namespaces, you don't need to wait for the deletion of a previous namespace to continue working in another namespace. I need to review 1 more time zoomed out to the larger picture. It is possible this is fine. I would like to reason thru the code from larger perspective.

@kensipe
Copy link
Member

kensipe commented Nov 11, 2022

the first thing I noticed is:

if the test is successful, however the namespace isn't deleted (perhaps a timeout) then the success turns to a failure. Do we want a failure for this? I can actually see both cases... we could see a timeout on wait and the cluster is at fault for which it is a false negative. I could see a timeout on deletion which is caused by namespace finalizers for which you do want a failure to be reported.

It would be good IMO to discuss all scenarios and reason thru expectations. It may lead to another test configuration to wait for deletion or to declaratively express if deletion should raise a failure.

@eddycharly
Copy link
Contributor Author

I'm not sure why we wouldn't want to wait.

I discovered the issue when running kuttl twice rapidly, first time all worked fine, second time tests failed because a namespace created during the first run was not yet deleted (not waiting actual namespace deletion can have an impact across multiple kuttl runs).

In this sense I feel like failing to cleanup should lead to a test failure, cleaning up correctly sounds important to me, especially when running kuttl on non ephemeral clusters. If the namespace fails to cleanup because of finalisers, the test can probably be improved to perform resources deletion in the correct order.

@eddycharly
Copy link
Contributor Author

I didn't know about the WaitForDelete function, I will look at it.

I actually have a strong preference for passing ctx

💯

@kensipe
Copy link
Member

kensipe commented Nov 11, 2022

actually with the ns deletion moving to the test.Cleanup it makes it much easier to reason about... (thanks for that). I think this makes sense! all other tests will be separate tests parallel or not. This makes sense as long as we are all good with ns deletion timeouts are a test failure.

@eddycharly
Copy link
Contributor Author

I'm not sure why we wouldn't want to wait.

IMHO it's not just about namespaces but all resources a kuttl test creates, the deletion order matters too, we should not try to delete everything concurrently, this can lead to serious problems.

Take the following case:

  • create a CRD
  • deploy a controller that uses finalisers
  • create a CR

If we delete the controller before the CR we're stuck, the CR will never be deleted because the controller will never perform finalisation. If this happens I would expect a test failure.

@kensipe
Copy link
Member

kensipe commented Nov 11, 2022

I'm not confident in the error message we will receive for this... worth a look... looking...

@eddycharly
Copy link
Contributor Author

This makes sense as long as we are all good with ns deletion timeouts are a test failure

I'm fine with this. If it happens it's probably a sign my test scenario has holes. We could/should document that.

@eddycharly
Copy link
Contributor Author

I'm not confident in the error message we will receive for this

Let me know if you want me to change the message.

@eddycharly
Copy link
Contributor Author

@kensipe WaitForDelete is not used anymore right ?
I feel like it's a nice helper, we could add context support to it and start using it more, WDYT ?

@kensipe kensipe merged commit 6b6e6b9 into kudobuilder:main Nov 14, 2022
kensipe added a commit that referenced this pull request Nov 14, 2022
kensipe added a commit that referenced this pull request Nov 14, 2022
kensipe pushed a commit that referenced this pull request Nov 14, 2022
Co-authored-by: Erik Godding Boye <egboye@gmail.com>
Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
iblancasa pushed a commit to iblancasa/kuttl that referenced this pull request Nov 17, 2022
Co-authored-by: Erik Godding Boye <egboye@gmail.com>
Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
Signed-off-by: Israel Blancas <iblancasa@gmail.com>
iblancasa pushed a commit to iblancasa/kuttl that referenced this pull request Nov 17, 2022
…udobuilder#421)

This reverts commit 6b6e6b9.

Signed-off-by: Israel Blancas <iblancasa@gmail.com>
kensipe pushed a commit that referenced this pull request Dec 20, 2022
Co-authored-by: Erik Godding Boye <egboye@gmail.com>
Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
kensipe added a commit that referenced this pull request Dec 20, 2022
* fix: wait namespace is actually deleted (#413)
* Adding Skip Delete for Namespace in Tests

Co-authored-by: Ken Sipe <kensipe@gmail.com>
Co-authored-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
Co-authored-by: Erik Godding Boye <egboye@gmail.com>
Signed-off-by: Charles-Edouard Brétéché <charles.edouard@nirmata.com>
Signed-off-by: Ken Sipe <kensipe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants