Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize cleanup of created resources #261

Merged
merged 11 commits into from
Aug 4, 2020

Conversation

timoreimann
Copy link
Contributor

@timoreimann timoreimann commented Apr 27, 2020

What type of PR is this?

/kind feature

What this PR does / why we need it:

This change revamps the way resources (like volumes and now also snapshots) are managed in tests with regards to cleaning up. Instead of putting the onus of cleaning up on the test author, we extend Cleanup to automatically (un-)register resources as they are being used.

Cleanup now exposes a single API that implements both ControllerClient and NodeClient to make it easier for all garbage collection-worthy requests to be funnelled through the new API. The way this is implemented in Cleanup is by embedding both ControllerClient and NodeClient, and proxying to the actual methods before registering cleanup tasks and returning the results.

Consequently, we can throw away large chunks of cleanup test code and unify all {Controller,Node}Client access to the Cleanup variable. In essence, this makes it much easier to do the right thing as a test author since each existing Describe context will provide a single
interaction point to the CSI APIs only.

For frequently used resource creation operations, we also provide Must* equivalents that fail the test if the results are unexpected. This makes our test code even more streamlined by DRYing out the number of assertions called.

List of other changes:

cleanup.go:

  • Key volume and snapshot objects by ID instead of name. We have a few tests that omit or reuse the name, which makes it impossible to do automatic cleanup. Not printing the name of the resource as we clean up is a small price we have to pay for this adjustment, though.
  • Fail tests when any cleanup operation errors out, except when we see error codes indicating that the resource is already cleaned up. Using a small logger wrapper to simplify automatic test failure.
  • Rename DeleteVolumes to Cleanup.
  • Provide convenience method MustCreateSnapshotFromVolumeRequest to create a sourcing volume and a snapshot in one go.

controller.go, node.go:

  • Change all tests to use the API exposed by Cleanup only. (That is, do not offer ControllerClient and NodeClient directly anymore.)
  • Register Cleanup.Cleanup in AfterEach where missing.
  • Drop cleanup steps from various tests as this is now being taken care of by Cleanup.
  • Use Must* equivalents were applicable.
  • Use HaveLen to simplify length assertions.
  • Make order of Cleanup variable initialization consistent.
  • Minor cosmetic improvements.

Rename Cleanup to Resources and the file name accordingly.

Which issue(s) this PR fixes:

Fixes #260

Special notes for your reviewer:

Cleanup probably deserves a more generic name as this point, like Resources. I hesitated from renaming the variable (and the hosting file name) though to ease diffing the change. If this change and the rename proposal finds consensus, I'm happy to carry it out either through another commit or a follow-up PR. (As agreed on during the review, this PR now also does the rename.)

Does this PR introduce a user-facing change?:

Rename Cleanup to Resources and unexport cleanup (un-)registration, which is now handled implicitly and automatically.

/cc @pohly

@k8s-ci-robot k8s-ci-robot requested a review from pohly April 27, 2020 14:43
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 27, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @timoreimann. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 27, 2020
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. Much cleaner (no pun intended, or did I?) cleanup code...

The csi-test repo is also a Go module that others may import and call directly (we do that in PMEM-CSI), so this is a breaking API change which must be dealt with accordingly:

  • announce it in the release note
  • bump the version of the repo to v4.0.0, which implies updating the import paths

If this change and the rename proposal finds consensus, I'm happy to carry it out either through another commit or a follow-up PR.

I think we should do that in a separate commit.

NodeClient csi.NodeClient
Context *TestContext
// ControllerClient is meant for struct-internal use only
csi.ControllerClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's meant for that, but we don't enforce that because the embedded member is getting exported, right?

I'm on the edge whether the API should prevent access by making it an unexpected member (controllerClient csi.ControllerClient). There may be valid cases where a user may want to call the methods that aren't wrapped.

I think I prefer keeping it like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment only because unexporting wouldn't help: the (repo-internal) consumers are all part of the same sanity package, so they'd still be able to access controllerClient. It'd have to be moved into a sub-package, but I didn't want to go that far in my PR.

I can't think of too many reasons why users wouldn't want to go through the Cleanup layer: unless csi-test was buggy, it should do the right thing. The one case I can think of is when the Cleanup() part shouldn't be executed. Maybe that's reasonable, so let's stick to keeping it exported. I updated the comment to make the implications more clear.

},
); err != nil {
logger.Printf("warning: NodeUnstageVolume: %s", err)
if status.Code(err) != codes.NotFound {
Fail(fmt.Sprintf("NodeUnpublishVolume failed: %s", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't Fail during cleanup while there is still work to do. Instead the code should try to execute all operations, log and/or collect failures, and then in the end fail the test.

Otherwise cleaning up stops early although some other volumes perhaps could be deleted successfully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I created a small logger wrapper that tries to simplify the task.

// successfully created.
func (cl *Cleanup) MustCreateSnapshot(ctx context.Context, req *csi.CreateSnapshotRequest) *csi.CreateSnapshotResponse {
snap, err := cl.createSnapshot(ctx, req)
Expect(err).NotTo(HaveOccurred())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem with assertions in helper functions is that Ginkgo only reports one source line for the error by default; even with -trace, that initial line is typically not very informative.

Better use ExpectWithOffset and add an offset parameter to the MustCreateSnapshot parameters so that this helper function an also be called indirectly through some other helper functions.

Also, NotTo(HaveOccurred()) without additional explanation is potentially problematic, depending on how much information is in the error. Much too often the error is very generic, in which case the assertion produced by Gomega doesn't say anything about what failed.

Better always use NoTo(HaveOccurred(), "create snapshot", potentially even with further parameters.

I know, much of the existing code doesn't do that properly either 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I remember you previously mentioned how To() should have a description but forgot about it again. I made sure it's now set.

I also updated the code to specify and/or pass through the offset everywhere. It's not a beauty though, I wonder if ginkgo could do better here by deriving the offset automatically (at least after the top-level t.Helper()-like indicator).

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2020
@timoreimann timoreimann force-pushed the cleanup-consistently branch from 92973ec to 9cb04b3 Compare May 8, 2020 22:04
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2020
@timoreimann
Copy link
Contributor Author

@pohly all comments addressed, except for the two regarding the breaking change because I have one more dependent question:

I might not be fully familiar with how consumers of csi-test are expected to be given access. For the DigitalOcean CSI driver, we merely configure and start the tests. Do you (or others) go beyond that, specifically by accessing exported variables from the Cleanup / Resources struct?

The reason I'm asking is that I'm wondering if we should take advantage of the breaking change and start hiding parts of the package under an internal package. What are your thoughts on that?

@pohly
Copy link
Contributor

pohly commented May 11, 2020

Do you (or others) go beyond that, specifically by accessing exported variables from the Cleanup / Resources struct?

Yes, in PMEM-CSI we do have custom tests that are built on top of the sanity infrastructure, in addition to running the pre-defined tests: https://github.com/intel/pmem-csi/blob/018313154dff214da21fe39e6902d87857bc26e8/test/e2e/storage/sanity.go#L191-L230

The reason I'm asking is that I'm wondering if we should take advantage of the breaking change and start hiding parts of the package under an internal package. What are your thoughts on that?

Please don't 😅

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels May 11, 2020
@timoreimann
Copy link
Contributor Author

@pohly alrighty, I added a release note and moved v3 to v4 while also updating the links. Let me know if I missed something.

@timoreimann
Copy link
Contributor Author

Would love to see this getting approved and merged soonish because it touches a fair amount of existing tests, so other merges happening meanwhile stand a fair chance of generating merge conflicts.

@pohly
Copy link
Contributor

pohly commented May 18, 2020

/kind api-change

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label May 18, 2020
@pohly
Copy link
Contributor

pohly commented May 18, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 18, 2020

// Assert fails the spec if any error was logged.
func (l *logger) Assert(offset int) {
ExpectWithOffset(offset+1, l.failed).To(BeFalse())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly calling https://pkg.go.dev/github.com/onsi/ginkgo?tab=doc#Fail with a caller skip parameter and a suitable message is probably going to look better in the resulting test failure.

If you want to make the failure message more informative, count errors and include the count here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer logger.Assert() looks odd. Assert - assert what?

Perhaps logger.CheckForErrors() or (similar to framework.ExpectNoError) logger.ExpectNoErrors()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented both.

@timoreimann timoreimann force-pushed the cleanup-consistently branch from 45af82a to e4952d3 Compare May 20, 2020 23:31
@timoreimann
Copy link
Contributor Author

timoreimann commented May 20, 2020

I rebased once more. Also noticed I missed some changes needed for the v3->v4 transition, so updated that as well.

It seems like I had to run go mod vendor as well. Did that with Go v1.12 as that seems to be the minimum Go version according to go.mod.

@pohly
Copy link
Contributor

pohly commented May 25, 2020

It seems like I had to run go mod vendor as well. Did that with Go v1.12 as that seems to be the minimum Go version according to go.mod.

I think go.mod specifies what we are compatible with. However, in practice this isn't getting tested: we only test with the Go version specified in

As long as Go 1.12 and 1.13 produce the same output, that doesn't matter. It worked here, so this is FYIO. However, I have seen cases where it didn't work and the pre-merge check with Go 1.13 complained.

This is true for all Kubernetes-CSI repos. I wonder whether we should:

  • extend our testing to cover building and testing with several Go releases or
  • bump up the version in go.mod.

The latter has the problem that it prevents downstream users from using an older Go even when that would still technically work. This is only an issue for repos that may get imported by others as a dependency (csi-test, csi-lib-utils).

@msau42: any thoughts on this?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2020
@pohly
Copy link
Contributor

pohly commented May 25, 2020

@timoreimann please rebase.

@timoreimann timoreimann force-pushed the cleanup-consistently branch from e4952d3 to 138b631 Compare May 25, 2020 09:22
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2020
@timoreimann
Copy link
Contributor Author

@pohly rebased. I also verified that 1.13 does not differ with regards to go mod tidying / vendoring, even though I think we figured that already.

@timoreimann timoreimann force-pushed the cleanup-consistently branch from d7256f6 to 25e66ce Compare July 29, 2020 20:55
@timoreimann timoreimann force-pushed the cleanup-consistently branch 4 times, most recently from 36e1977 to 2e872cb Compare July 29, 2020 23:43
@timoreimann timoreimann force-pushed the cleanup-consistently branch from 2e872cb to 357a804 Compare July 30, 2020 00:59
@timoreimann
Copy link
Contributor Author

timoreimann commented Jul 30, 2020

@pohly I figured out why the tests were failing: two AfterEach() blocks to clean up after creating and deleting snapshot tests were missing, so the left around snapshots affected other tests. I suppose it worked locally for me because of different execution orders.

I pushed a fixing commit and rebased from master. From my point of view, the PR is now good to move on.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2020
@pohly
Copy link
Contributor

pohly commented Aug 4, 2020

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, timoreimann

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2020
@k8s-ci-robot k8s-ci-robot merged commit 09bd3cf into kubernetes-csi:master Aug 4, 2020
@timoreimann timoreimann deleted the cleanup-consistently branch August 11, 2020 07:48
timoreimann added a commit to timoreimann/csi-test that referenced this pull request Aug 31, 2020
[1] accidentally swapped the cleanup order which represents a deviation
to the previous behavior that not all CSI drivers may be able to handle.
This change restores the original order.

[1]: kubernetes-csi#261
timoreimann added a commit to timoreimann/csi-test that referenced this pull request Sep 29, 2020
This addresses a regression in [1] causing plugins to return an in-use
error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot
or a volume) is deleted before the sourced volume is.

[1]: kubernetes-csi#261
timoreimann added a commit to timoreimann/csi-test that referenced this pull request Jan 20, 2021
This addresses a regression in [1] causing plugins to return an in-use
error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot
or a volume) is deleted before the sourced volume is.

[1]: kubernetes-csi#261
TerryHowe added a commit to TerryHowe/csi-test that referenced this pull request Dec 6, 2024
734c2b9 Merge pull request kubernetes-csi#265 from Rakshith-R/consider-main-branch
f95c855 Merge pull request kubernetes-csi#262 from huww98/golang-toolchain
3c8d966 Treat main branch as equivalent to master branch
e31de52 Merge pull request kubernetes-csi#261 from huww98/golang
fd153a9 Bump golang to 1.23.1
a8b3d05 pull-test.sh: fix "git subtree pull" errors
6b05f0f use new GOTOOLCHAIN env to manage go version
227577e Merge pull request kubernetes-csi#258 from gnufied/enable-race-detection
e1ceee2 Always enable race detection while running tests
988496a Merge pull request kubernetes-csi#257 from jakobmoellerdev/csi-prow-sidecar-e2e-path
028f8c6 chore: bump to Go 1.22.5
69bd71e chore: add CSI_PROW_SIDECAR_E2E_PATH
f40f0cc Merge pull request kubernetes-csi#256 from solumath/master
cfa9210 Instruction update
379a1bb Merge pull request kubernetes-csi#255 from humblec/sidecar-md
a5667bb fix typo in sidecar release process
4967685 Merge pull request kubernetes-csi#254 from bells17/add-github-actions
d9bd160 Update skip list in codespell GitHub Action
adb3af9 Merge pull request kubernetes-csi#252 from bells17/update-go-version
f5aebfc Add GitHub Actions workflows
b82ee38 Merge pull request kubernetes-csi#253 from bells17/fix-typo
c317456 Fix typo
0a78505 Bump to Go 1.22.3
edd89ad Merge pull request kubernetes-csi#251 from jsafrane/add-logcheck
043fd09 Add test-logcheck target
d7535ae Merge pull request kubernetes-csi#250 from jsafrane/go-1.22
b52e7ad Update go to 1.22.2
14fdb6f Merge pull request kubernetes-csi#247 from msau42/prow
dc4d0ae Merge pull request kubernetes-csi#249 from jsafrane/use-go-version
e681b17 Use .go-version to get Kubernetes go version
9b4352e Update release playbook
c7bb972 Fix release notes script to use fixed tags
463a0e9 Add script to update specific go modules
b54c1ba Merge pull request kubernetes-csi#246 from xing-yang/go_1.21
5436c81 Change go version to 1.21.5
267b40e Merge pull request kubernetes-csi#244 from carlory/sig-storage
b42e5a2 nominate self (carlory) as kubernetes-csi reviewer
a17f536 Merge pull request kubernetes-csi#210 from sunnylovestiramisu/sidecar
011033d Use set -x instead of die
5deaf66 Add wrapper script for sidecar release
f8c8cc4 Merge pull request kubernetes-csi#237 from msau42/prow
b36b5bf Merge pull request kubernetes-csi#240 from dannawang0221/upgrade-go-version
adfddcc Merge pull request kubernetes-csi#243 from pohly/git-subtree-pull-fix
c465088 pull-test.sh: avoid "git subtree pull" error
7b175a1 Update csi-test version to v5.2.0
987c90c Update go version to 1.21 to match k/k
2c625d4 Add script to generate patch release notes
f9d5b9c Merge pull request kubernetes-csi#236 from mowangdk/feature/bump_csi-driver-host-path_version
b01fd53 Bump csi-driver-host-path version up to v1.12.0
984feec Merge pull request kubernetes-csi#234 from siddhikhapare/csi-tools
1f7e605 fixed broken links of testgrid dashboard
de2fba8 Merge pull request kubernetes-csi#233 from andyzhangx/andyzhangx-patch-1
cee895e remove windows 20H2 build since it's EOL long time ago
670bb0e Merge pull request kubernetes-csi#229 from marosset/fix-codespell-errors
35d5e78 Merge pull request kubernetes-csi#219 from yashsingh74/update-registry
63473cc Merge pull request kubernetes-csi#231 from coulof/bump-go-version-1.20.5
29a5c76 Merge pull request kubernetes-csi#228 from mowangdk/chore/adopt_kubernetes_recommand_labels
8dd2821 Update cloudbuild image with go 1.20.5
1df23db Merge pull request kubernetes-csi#230 from msau42/prow
1f92b7e Add ginkgo timeout to e2e tests to help catch any stuck tests
2b8b80e fixing some codespell errors
c10b678 Merge pull request kubernetes-csi#227 from coulof/check-sidecar-supported-versions
72984ec chore: adopt kubernetes recommand label
b055535 Header
bd0a10b typo
c39d73c Add comments
f6491af Script to verify EOL sidecar version
4133d1d Merge pull request kubernetes-csi#226 from msau42/cloudbuild
8d519d2 Pin buildkit to v0.10.6 to workaround v0.11 bug with docker manifest
6e04a03 Merge pull request kubernetes-csi#224 from msau42/cloudbuild
26fdfff Update cloudbuild image
6613c39 Merge pull request kubernetes-csi#223 from sunnylovestiramisu/update
0e7ae99 Update k8s image repo url
77e47cc Merge pull request kubernetes-csi#222 from xinydev/fix-dep-version
155854b Fix dep version mismatch
8f83905 Merge pull request kubernetes-csi#221 from sunnylovestiramisu/go-update
1d3f94d Update go version to 1.20 to match k/k v1.27
e322ce5 Merge pull request kubernetes-csi#220 from andyzhangx/fix-golint-error
b74a512 test: fix golint error
901bcb5 Update registry k8s.gcr.io -> registry.k8s.io

git-subtree-dir: release-tools
git-subtree-split: 734c2b950c4b31f64b63052c64ffa5929d1c9b97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed tests leak CSI resources
4 participants