Centralize cleanup of created resources #261

timoreimann · 2020-04-27T14:43:40Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This change revamps the way resources (like volumes and now also snapshots) are managed in tests with regards to cleaning up. Instead of putting the onus of cleaning up on the test author, we extend Cleanup to automatically (un-)register resources as they are being used.

Cleanup now exposes a single API that implements both ControllerClient and NodeClient to make it easier for all garbage collection-worthy requests to be funnelled through the new API. The way this is implemented in Cleanup is by embedding both ControllerClient and NodeClient, and proxying to the actual methods before registering cleanup tasks and returning the results.

Consequently, we can throw away large chunks of cleanup test code and unify all {Controller,Node}Client access to the Cleanup variable. In essence, this makes it much easier to do the right thing as a test author since each existing Describe context will provide a single
interaction point to the CSI APIs only.

For frequently used resource creation operations, we also provide Must* equivalents that fail the test if the results are unexpected. This makes our test code even more streamlined by DRYing out the number of assertions called.

List of other changes:

cleanup.go:

Key volume and snapshot objects by ID instead of name. We have a few tests that omit or reuse the name, which makes it impossible to do automatic cleanup. Not printing the name of the resource as we clean up is a small price we have to pay for this adjustment, though.
Fail tests when any cleanup operation errors out, except when we see error codes indicating that the resource is already cleaned up. Using a small logger wrapper to simplify automatic test failure.
Rename DeleteVolumes to Cleanup.
Provide convenience method MustCreateSnapshotFromVolumeRequest to create a sourcing volume and a snapshot in one go.

controller.go, node.go:

Change all tests to use the API exposed by Cleanup only. (That is, do not offer ControllerClient and NodeClient directly anymore.)
Register Cleanup.Cleanup in AfterEach where missing.
Drop cleanup steps from various tests as this is now being taken care of by Cleanup.
Use Must* equivalents were applicable.
Use HaveLen to simplify length assertions.
Make order of Cleanup variable initialization consistent.
Minor cosmetic improvements.

Rename Cleanup to Resources and the file name accordingly.

Which issue(s) this PR fixes:

Fixes #260

Special notes for your reviewer:

Cleanup probably deserves a more generic name as this point, like Resources. I hesitated from renaming the variable (and the hosting file name) though to ease diffing the change. If this change and the rename proposal finds consensus, I'm happy to carry it out either through another commit or a follow-up PR. (As agreed on during the review, this PR now also does the rename.)

Does this PR introduce a user-facing change?:

Rename Cleanup to Resources and unexport cleanup (un-)registration, which is now handled implicitly and automatically.

/cc @pohly

k8s-ci-robot · 2020-04-27T14:43:49Z

Hi @timoreimann. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly

I like this a lot. Much cleaner (no pun intended, or did I?) cleanup code...

The csi-test repo is also a Go module that others may import and call directly (we do that in PMEM-CSI), so this is a breaking API change which must be dealt with accordingly:

announce it in the release note
bump the version of the repo to v4.0.0, which implies updating the import paths

If this change and the rename proposal finds consensus, I'm happy to carry it out either through another commit or a follow-up PR.

I think we should do that in a separate commit.

pohly · 2020-04-28T06:30:07Z

pkg/sanity/cleanup.go

-	NodeClient                 csi.NodeClient
+	Context *TestContext
+	// ControllerClient is meant for struct-internal use only
+	csi.ControllerClient


It's meant for that, but we don't enforce that because the embedded member is getting exported, right?

I'm on the edge whether the API should prevent access by making it an unexpected member (controllerClient csi.ControllerClient). There may be valid cases where a user may want to call the methods that aren't wrapped.

I think I prefer keeping it like this.

I left a comment only because unexporting wouldn't help: the (repo-internal) consumers are all part of the same sanity package, so they'd still be able to access controllerClient. It'd have to be moved into a sub-package, but I didn't want to go that far in my PR.

I can't think of too many reasons why users wouldn't want to go through the Cleanup layer: unless csi-test was buggy, it should do the right thing. The one case I can think of is when the Cleanup() part shouldn't be executed. Maybe that's reasonable, so let's stick to keeping it exported. I updated the comment to make the implications more clear.

pohly · 2020-04-28T06:36:47Z

pkg/sanity/cleanup.go

 				},
 			); err != nil {
-				logger.Printf("warning: NodeUnstageVolume: %s", err)
+				if status.Code(err) != codes.NotFound {
+					Fail(fmt.Sprintf("NodeUnpublishVolume failed: %s", err))


We can't Fail during cleanup while there is still work to do. Instead the code should try to execute all operations, log and/or collect failures, and then in the end fail the test.

Otherwise cleaning up stops early although some other volumes perhaps could be deleted successfully.

That's a good point. I created a small logger wrapper that tries to simplify the task.

pohly · 2020-04-28T06:43:42Z

pkg/sanity/cleanup.go

+// successfully created.
+func (cl *Cleanup) MustCreateSnapshot(ctx context.Context, req *csi.CreateSnapshotRequest) *csi.CreateSnapshotResponse {
+	snap, err := cl.createSnapshot(ctx, req)
+	Expect(err).NotTo(HaveOccurred())


A problem with assertions in helper functions is that Ginkgo only reports one source line for the error by default; even with -trace, that initial line is typically not very informative.

Better use ExpectWithOffset and add an offset parameter to the MustCreateSnapshot parameters so that this helper function an also be called indirectly through some other helper functions.

Also, NotTo(HaveOccurred()) without additional explanation is potentially problematic, depending on how much information is in the error. Much too often the error is very generic, in which case the assertion produced by Gomega doesn't say anything about what failed.

Better always use NoTo(HaveOccurred(), "create snapshot", potentially even with further parameters.

I know, much of the existing code doesn't do that properly either 😢

Ah yes, I remember you previously mentioned how To() should have a description but forgot about it again. I made sure it's now set.

I also updated the code to specify and/or pass through the offset everywhere. It's not a beauty though, I wonder if ginkgo could do better here by deriving the offset automatically (at least after the top-level t.Helper()-like indicator).

timoreimann · 2020-05-08T22:05:03Z

@pohly all comments addressed, except for the two regarding the breaking change because I have one more dependent question:

I might not be fully familiar with how consumers of csi-test are expected to be given access. For the DigitalOcean CSI driver, we merely configure and start the tests. Do you (or others) go beyond that, specifically by accessing exported variables from the Cleanup / Resources struct?

The reason I'm asking is that I'm wondering if we should take advantage of the breaking change and start hiding parts of the package under an internal package. What are your thoughts on that?

pohly · 2020-05-11T09:54:36Z

Do you (or others) go beyond that, specifically by accessing exported variables from the Cleanup / Resources struct?

Yes, in PMEM-CSI we do have custom tests that are built on top of the sanity infrastructure, in addition to running the pre-defined tests: https://github.com/intel/pmem-csi/blob/018313154dff214da21fe39e6902d87857bc26e8/test/e2e/storage/sanity.go#L191-L230

The reason I'm asking is that I'm wondering if we should take advantage of the breaking change and start hiding parts of the package under an internal package. What are your thoughts on that?

Please don't 😅

timoreimann · 2020-05-11T10:57:04Z

@pohly alrighty, I added a release note and moved v3 to v4 while also updating the links. Let me know if I missed something.

timoreimann · 2020-05-17T09:07:30Z

Would love to see this getting approved and merged soonish because it touches a fair amount of existing tests, so other merges happening meanwhile stand a fair chance of generating merge conflicts.

pohly · 2020-05-18T12:48:46Z

/kind api-change

pohly · 2020-05-18T12:49:05Z

/ok-to-test

pohly · 2020-05-18T12:51:30Z

pkg/sanity/logger.go

+
+// Assert fails the spec if any error was logged.
+func (l *logger) Assert(offset int) {
+	ExpectWithOffset(offset+1, l.failed).To(BeFalse())


Directly calling https://pkg.go.dev/github.com/onsi/ginkgo?tab=doc#Fail with a caller skip parameter and a suitable message is probably going to look better in the resulting test failure.

If you want to make the failure message more informative, count errors and include the count here.

defer logger.Assert() looks odd. Assert - assert what?

Perhaps logger.CheckForErrors() or (similar to framework.ExpectNoError) logger.ExpectNoErrors()?

Implemented both.

timoreimann · 2020-05-20T23:33:03Z

I rebased once more. Also noticed I missed some changes needed for the v3->v4 transition, so updated that as well.

It seems like I had to run go mod vendor as well. Did that with Go v1.12 as that seems to be the minimum Go version according to go.mod.

pohly · 2020-05-25T08:47:45Z

It seems like I had to run go mod vendor as well. Did that with Go v1.12 as that seems to be the minimum Go version according to go.mod.

I think go.mod specifies what we are compatible with. However, in practice this isn't getting tested: we only test with the Go version specified in

csi-test/release-tools/travis.yml

Line 9 in e89bc15

- go: 1.13.3

As long as Go 1.12 and 1.13 produce the same output, that doesn't matter. It worked here, so this is FYIO. However, I have seen cases where it didn't work and the pre-merge check with Go 1.13 complained.

This is true for all Kubernetes-CSI repos. I wonder whether we should:

extend our testing to cover building and testing with several Go releases or
bump up the version in go.mod.

The latter has the problem that it prevents downstream users from using an older Go even when that would still technically work. This is only an issue for repos that may get imported by others as a dependency (csi-test, csi-lib-utils).

@msau42: any thoughts on this?

pohly · 2020-05-25T08:48:02Z

@timoreimann please rebase.

timoreimann · 2020-05-25T09:31:30Z

@pohly rebased. I also verified that 1.13 does not differ with regards to go mod tidying / vendoring, even though I think we figured that already.

timoreimann · 2020-07-30T06:11:14Z

@pohly I figured out why the tests were failing: two AfterEach() blocks to clean up after creating and deleting snapshot tests were missing, so the left around snapshots affected other tests. I suppose it worked locally for me because of different execution orders.

I pushed a fixing commit and rebased from master. From my point of view, the PR is now good to move on.

pohly

/lgtm

pohly · 2020-08-04T08:22:54Z

/approve

k8s-ci-robot · 2020-08-04T08:22:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, timoreimann

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

[1] accidentally swapped the cleanup order which represents a deviation to the previous behavior that not all CSI drivers may be able to handle. This change restores the original order. [1]: kubernetes-csi#261

This addresses a regression in [1] causing plugins to return an in-use error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot or a volume) is deleted before the sourced volume is. [1]: kubernetes-csi#261

734c2b9 Merge pull request kubernetes-csi#265 from Rakshith-R/consider-main-branch f95c855 Merge pull request kubernetes-csi#262 from huww98/golang-toolchain 3c8d966 Treat main branch as equivalent to master branch e31de52 Merge pull request kubernetes-csi#261 from huww98/golang fd153a9 Bump golang to 1.23.1 a8b3d05 pull-test.sh: fix "git subtree pull" errors 6b05f0f use new GOTOOLCHAIN env to manage go version 227577e Merge pull request kubernetes-csi#258 from gnufied/enable-race-detection e1ceee2 Always enable race detection while running tests 988496a Merge pull request kubernetes-csi#257 from jakobmoellerdev/csi-prow-sidecar-e2e-path 028f8c6 chore: bump to Go 1.22.5 69bd71e chore: add CSI_PROW_SIDECAR_E2E_PATH f40f0cc Merge pull request kubernetes-csi#256 from solumath/master cfa9210 Instruction update 379a1bb Merge pull request kubernetes-csi#255 from humblec/sidecar-md a5667bb fix typo in sidecar release process 4967685 Merge pull request kubernetes-csi#254 from bells17/add-github-actions d9bd160 Update skip list in codespell GitHub Action adb3af9 Merge pull request kubernetes-csi#252 from bells17/update-go-version f5aebfc Add GitHub Actions workflows b82ee38 Merge pull request kubernetes-csi#253 from bells17/fix-typo c317456 Fix typo 0a78505 Bump to Go 1.22.3 edd89ad Merge pull request kubernetes-csi#251 from jsafrane/add-logcheck 043fd09 Add test-logcheck target d7535ae Merge pull request kubernetes-csi#250 from jsafrane/go-1.22 b52e7ad Update go to 1.22.2 14fdb6f Merge pull request kubernetes-csi#247 from msau42/prow dc4d0ae Merge pull request kubernetes-csi#249 from jsafrane/use-go-version e681b17 Use .go-version to get Kubernetes go version 9b4352e Update release playbook c7bb972 Fix release notes script to use fixed tags 463a0e9 Add script to update specific go modules b54c1ba Merge pull request kubernetes-csi#246 from xing-yang/go_1.21 5436c81 Change go version to 1.21.5 267b40e Merge pull request kubernetes-csi#244 from carlory/sig-storage b42e5a2 nominate self (carlory) as kubernetes-csi reviewer a17f536 Merge pull request kubernetes-csi#210 from sunnylovestiramisu/sidecar 011033d Use set -x instead of die 5deaf66 Add wrapper script for sidecar release f8c8cc4 Merge pull request kubernetes-csi#237 from msau42/prow b36b5bf Merge pull request kubernetes-csi#240 from dannawang0221/upgrade-go-version adfddcc Merge pull request kubernetes-csi#243 from pohly/git-subtree-pull-fix c465088 pull-test.sh: avoid "git subtree pull" error 7b175a1 Update csi-test version to v5.2.0 987c90c Update go version to 1.21 to match k/k 2c625d4 Add script to generate patch release notes f9d5b9c Merge pull request kubernetes-csi#236 from mowangdk/feature/bump_csi-driver-host-path_version b01fd53 Bump csi-driver-host-path version up to v1.12.0 984feec Merge pull request kubernetes-csi#234 from siddhikhapare/csi-tools 1f7e605 fixed broken links of testgrid dashboard de2fba8 Merge pull request kubernetes-csi#233 from andyzhangx/andyzhangx-patch-1 cee895e remove windows 20H2 build since it's EOL long time ago 670bb0e Merge pull request kubernetes-csi#229 from marosset/fix-codespell-errors 35d5e78 Merge pull request kubernetes-csi#219 from yashsingh74/update-registry 63473cc Merge pull request kubernetes-csi#231 from coulof/bump-go-version-1.20.5 29a5c76 Merge pull request kubernetes-csi#228 from mowangdk/chore/adopt_kubernetes_recommand_labels 8dd2821 Update cloudbuild image with go 1.20.5 1df23db Merge pull request kubernetes-csi#230 from msau42/prow 1f92b7e Add ginkgo timeout to e2e tests to help catch any stuck tests 2b8b80e fixing some codespell errors c10b678 Merge pull request kubernetes-csi#227 from coulof/check-sidecar-supported-versions 72984ec chore: adopt kubernetes recommand label b055535 Header bd0a10b typo c39d73c Add comments f6491af Script to verify EOL sidecar version 4133d1d Merge pull request kubernetes-csi#226 from msau42/cloudbuild 8d519d2 Pin buildkit to v0.10.6 to workaround v0.11 bug with docker manifest 6e04a03 Merge pull request kubernetes-csi#224 from msau42/cloudbuild 26fdfff Update cloudbuild image 6613c39 Merge pull request kubernetes-csi#223 from sunnylovestiramisu/update 0e7ae99 Update k8s image repo url 77e47cc Merge pull request kubernetes-csi#222 from xinydev/fix-dep-version 155854b Fix dep version mismatch 8f83905 Merge pull request kubernetes-csi#221 from sunnylovestiramisu/go-update 1d3f94d Update go version to 1.20 to match k/k v1.27 e322ce5 Merge pull request kubernetes-csi#220 from andyzhangx/fix-golint-error b74a512 test: fix golint error 901bcb5 Update registry k8s.gcr.io -> registry.k8s.io git-subtree-dir: release-tools git-subtree-split: 734c2b950c4b31f64b63052c64ffa5929d1c9b97

k8s-ci-robot requested a review from pohly April 27, 2020 14:43

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 27, 2020

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 27, 2020

pohly requested changes Apr 28, 2020

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2020

timoreimann force-pushed the cleanup-consistently branch from 92973ec to 9cb04b3 Compare May 8, 2020 22:04

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2020

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels May 11, 2020

k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label May 18, 2020

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 18, 2020

pohly reviewed May 18, 2020

View reviewed changes

timoreimann force-pushed the cleanup-consistently branch from 45af82a to e4952d3 Compare May 20, 2020 23:31

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2020

timoreimann force-pushed the cleanup-consistently branch from e4952d3 to 138b631 Compare May 25, 2020 09:22

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2020

timoreimann added 7 commits July 29, 2020 22:47

Add descriptions to *To() matchers

3ad718e

Specify / pass through offset

205a3ba

Improve comment for {Controller,Node}Client

21803a9

Rename Cleanup to Resources

7ccafe9

Move v3 to v4 and adjust links

f7260c7

Use Fail on failure count for logger assertions

97e1281

Rename Assert to ExpectNoErrors

f19b7bf

timoreimann force-pushed the cleanup-consistently branch from d7256f6 to 25e66ce Compare July 29, 2020 20:55

Run go mod vendor

023c5fe

timoreimann force-pushed the cleanup-consistently branch 4 times, most recently from 36e1977 to 2e872cb Compare July 29, 2020 23:43

Add missing AfterEach() calls

357a804

timoreimann force-pushed the cleanup-consistently branch from 2e872cb to 357a804 Compare July 30, 2020 00:59

pohly approved these changes Aug 4, 2020

View reviewed changes

k8s-ci-robot assigned pohly Aug 4, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2020

k8s-ci-robot merged commit 09bd3cf into kubernetes-csi:master Aug 4, 2020

timoreimann deleted the cleanup-consistently branch August 11, 2020 07:48

timoreimann mentioned this pull request Aug 31, 2020

Cleanup snapshots before volumes #289

Merged

timoreimann mentioned this pull request Sep 29, 2020

Prevent plugin from returning in-use error for source volumes #297

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralize cleanup of created resources #261

Centralize cleanup of created resources #261

timoreimann commented Apr 27, 2020 •

edited

Loading

k8s-ci-robot commented Apr 27, 2020

pohly left a comment

pohly Apr 28, 2020

timoreimann May 8, 2020

pohly Apr 28, 2020

timoreimann May 8, 2020

pohly Apr 28, 2020

timoreimann May 8, 2020

timoreimann commented May 8, 2020

pohly commented May 11, 2020

timoreimann commented May 11, 2020

timoreimann commented May 17, 2020

pohly commented May 18, 2020

pohly commented May 18, 2020

pohly May 18, 2020

pohly May 18, 2020

timoreimann May 20, 2020

timoreimann commented May 20, 2020 •

edited

Loading

pohly commented May 25, 2020

pohly commented May 25, 2020

timoreimann commented May 25, 2020

timoreimann commented Jul 30, 2020 •

edited

Loading

pohly left a comment

pohly commented Aug 4, 2020

k8s-ci-robot commented Aug 4, 2020

Centralize cleanup of created resources #261

Centralize cleanup of created resources #261

Conversation

timoreimann commented Apr 27, 2020 • edited Loading

k8s-ci-robot commented Apr 27, 2020

pohly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timoreimann commented May 8, 2020

pohly commented May 11, 2020

timoreimann commented May 11, 2020

timoreimann commented May 17, 2020

pohly commented May 18, 2020

pohly commented May 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timoreimann commented May 20, 2020 • edited Loading

pohly commented May 25, 2020

pohly commented May 25, 2020

timoreimann commented May 25, 2020

timoreimann commented Jul 30, 2020 • edited Loading

pohly left a comment

Choose a reason for hiding this comment

pohly commented Aug 4, 2020

k8s-ci-robot commented Aug 4, 2020

timoreimann commented Apr 27, 2020 •

edited

Loading

timoreimann commented May 20, 2020 •

edited

Loading

timoreimann commented Jul 30, 2020 •

edited

Loading