-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralize cleanup of created resources #261
Centralize cleanup of created resources #261
Conversation
Hi @timoreimann. Thanks for your PR. I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot. Much cleaner (no pun intended, or did I?) cleanup code...
The csi-test
repo is also a Go module that others may import and call directly (we do that in PMEM-CSI), so this is a breaking API change which must be dealt with accordingly:
- announce it in the release note
- bump the version of the repo to v4.0.0, which implies updating the import paths
If this change and the rename proposal finds consensus, I'm happy to carry it out either through another commit or a follow-up PR.
I think we should do that in a separate commit.
pkg/sanity/cleanup.go
Outdated
NodeClient csi.NodeClient | ||
Context *TestContext | ||
// ControllerClient is meant for struct-internal use only | ||
csi.ControllerClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's meant for that, but we don't enforce that because the embedded member is getting exported, right?
I'm on the edge whether the API should prevent access by making it an unexpected member (controllerClient csi.ControllerClient
). There may be valid cases where a user may want to call the methods that aren't wrapped.
I think I prefer keeping it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment only because unexporting wouldn't help: the (repo-internal) consumers are all part of the same sanity
package, so they'd still be able to access controllerClient
. It'd have to be moved into a sub-package, but I didn't want to go that far in my PR.
I can't think of too many reasons why users wouldn't want to go through the Cleanup
layer: unless csi-test was buggy, it should do the right thing. The one case I can think of is when the Cleanup()
part shouldn't be executed. Maybe that's reasonable, so let's stick to keeping it exported. I updated the comment to make the implications more clear.
pkg/sanity/cleanup.go
Outdated
}, | ||
); err != nil { | ||
logger.Printf("warning: NodeUnstageVolume: %s", err) | ||
if status.Code(err) != codes.NotFound { | ||
Fail(fmt.Sprintf("NodeUnpublishVolume failed: %s", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't Fail
during cleanup while there is still work to do. Instead the code should try to execute all operations, log and/or collect failures, and then in the end fail the test.
Otherwise cleaning up stops early although some other volumes perhaps could be deleted successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I created a small logger wrapper that tries to simplify the task.
pkg/sanity/cleanup.go
Outdated
// successfully created. | ||
func (cl *Cleanup) MustCreateSnapshot(ctx context.Context, req *csi.CreateSnapshotRequest) *csi.CreateSnapshotResponse { | ||
snap, err := cl.createSnapshot(ctx, req) | ||
Expect(err).NotTo(HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem with assertions in helper functions is that Ginkgo only reports one source line for the error by default; even with -trace
, that initial line is typically not very informative.
Better use ExpectWithOffset
and add an offset parameter to the MustCreateSnapshot
parameters so that this helper function an also be called indirectly through some other helper functions.
Also, NotTo(HaveOccurred())
without additional explanation is potentially problematic, depending on how much information is in the error. Much too often the error is very generic, in which case the assertion produced by Gomega doesn't say anything about what failed.
Better always use NoTo(HaveOccurred(), "create snapshot"
, potentially even with further parameters.
I know, much of the existing code doesn't do that properly either 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I remember you previously mentioned how To()
should have a description but forgot about it again. I made sure it's now set.
I also updated the code to specify and/or pass through the offset everywhere. It's not a beauty though, I wonder if ginkgo could do better here by deriving the offset automatically (at least after the top-level t.Helper()
-like indicator).
92973ec
to
9cb04b3
Compare
@pohly all comments addressed, except for the two regarding the breaking change because I have one more dependent question: I might not be fully familiar with how consumers of The reason I'm asking is that I'm wondering if we should take advantage of the breaking change and start hiding parts of the package under an |
Yes, in PMEM-CSI we do have custom tests that are built on top of the sanity infrastructure, in addition to running the pre-defined tests: https://github.com/intel/pmem-csi/blob/018313154dff214da21fe39e6902d87857bc26e8/test/e2e/storage/sanity.go#L191-L230
Please don't 😅 |
@pohly alrighty, I added a release note and moved v3 to v4 while also updating the links. Let me know if I missed something. |
Would love to see this getting approved and merged soonish because it touches a fair amount of existing tests, so other merges happening meanwhile stand a fair chance of generating merge conflicts. |
/kind api-change |
/ok-to-test |
pkg/sanity/logger.go
Outdated
|
||
// Assert fails the spec if any error was logged. | ||
func (l *logger) Assert(offset int) { | ||
ExpectWithOffset(offset+1, l.failed).To(BeFalse()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly calling https://pkg.go.dev/github.com/onsi/ginkgo?tab=doc#Fail with a caller skip parameter and a suitable message is probably going to look better in the resulting test failure.
If you want to make the failure message more informative, count errors and include the count here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer logger.Assert()
looks odd. Assert - assert what?
Perhaps logger.CheckForErrors()
or (similar to framework.ExpectNoError
) logger.ExpectNoErrors()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented both.
45af82a
to
e4952d3
Compare
I rebased once more. Also noticed I missed some changes needed for the v3->v4 transition, so updated that as well. It seems like I had to run |
I think go.mod specifies what we are compatible with. However, in practice this isn't getting tested: we only test with the Go version specified in csi-test/release-tools/travis.yml Line 9 in e89bc15
As long as Go 1.12 and 1.13 produce the same output, that doesn't matter. It worked here, so this is FYIO. However, I have seen cases where it didn't work and the pre-merge check with Go 1.13 complained. This is true for all Kubernetes-CSI repos. I wonder whether we should:
The latter has the problem that it prevents downstream users from using an older Go even when that would still technically work. This is only an issue for repos that may get imported by others as a dependency (csi-test, csi-lib-utils). @msau42: any thoughts on this? |
@timoreimann please rebase. |
e4952d3
to
138b631
Compare
@pohly rebased. I also verified that 1.13 does not differ with regards to |
d7256f6
to
25e66ce
Compare
36e1977
to
2e872cb
Compare
2e872cb
to
357a804
Compare
@pohly I figured out why the tests were failing: two I pushed a fixing commit and rebased from master. From my point of view, the PR is now good to move on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly, timoreimann The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[1] accidentally swapped the cleanup order which represents a deviation to the previous behavior that not all CSI drivers may be able to handle. This change restores the original order. [1]: kubernetes-csi#261
This addresses a regression in [1] causing plugins to return an in-use error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot or a volume) is deleted before the sourced volume is. [1]: kubernetes-csi#261
This addresses a regression in [1] causing plugins to return an in-use error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot or a volume) is deleted before the sourced volume is. [1]: kubernetes-csi#261
What type of PR is this?
What this PR does / why we need it:
This change revamps the way resources (like volumes and now also snapshots) are managed in tests with regards to cleaning up. Instead of putting the onus of cleaning up on the test author, we extend
Cleanup
to automatically (un-)register resources as they are being used.Cleanup
now exposes a single API that implements bothControllerClient
andNodeClient
to make it easier for all garbage collection-worthy requests to be funnelled through the new API. The way this is implemented inCleanup
is by embedding bothControllerClient
andNodeClient
, and proxying to the actual methods before registering cleanup tasks and returning the results.Consequently, we can throw away large chunks of cleanup test code and unify all
{Controller,Node}Client
access to theCleanup
variable. In essence, this makes it much easier to do the right thing as a test author since each existingDescribe
context will provide a singleinteraction point to the CSI APIs only.
For frequently used resource creation operations, we also provide
Must*
equivalents that fail the test if the results are unexpected. This makes our test code even more streamlined by DRYing out the number of assertions called.List of other changes:
cleanup.go
:DeleteVolumes
toCleanup
.MustCreateSnapshotFromVolumeRequest
to create a sourcing volume and a snapshot in one go.controller.go
,node.go
:Cleanup
only. (That is, do not offerControllerClient
andNodeClient
directly anymore.)Cleanup.Cleanup
inAfterEach
where missing.Cleanup
.Must*
equivalents were applicable.HaveLen
to simplify length assertions.Cleanup
variable initialization consistent.Rename
Cleanup
toResources
and the file name accordingly.Which issue(s) this PR fixes:
Fixes #260
Special notes for your reviewer:
(As agreed on during the review, this PR now also does the rename.)Cleanup
probably deserves a more generic name as this point, likeResources
. I hesitated from renaming the variable (and the hosting file name) though to ease diffing the change. If this change and the rename proposal finds consensus, I'm happy to carry it out either through another commit or a follow-up PR.Does this PR introduce a user-facing change?:
/cc @pohly