Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONTAP ephemeral snapshot remains in backend when doing PVC->PVC CSI clone #901

Closed
akalenyu opened this issue Apr 18, 2024 · 6 comments
Closed
Labels

Comments

@akalenyu
Copy link

Describe the bug
A CSI clone works by creating an ephemeral snapshot behind the scenes.
This snapshot is not cleaned up potentially resulting in hitting the snapshot limit for FSx:
https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/snapshots-ontap.html (1023 per volume)

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: v24.02.0
  • Trident installation flags used: [e.g. -d -n trident --use-custom-yaml]
  • Container runtime: [e.g. Docker 19.03.1-CE]
  • Kubernetes version: [e.g. 1.15.1]
  • Kubernetes orchestrator: OpenShift 4.15
  • Kubernetes enabled feature gates: [e.g. CSINodeInfo]
  • OS: [e.g. RHEL 7.6, Ubuntu 16.04]
  • NetApp backend types: ONTAP
  • Other:

To Reproduce

  • Create PVC
  • Clone the above PVC
kind: PersistentVolumeClaim
metadata:
  name: simple-pvc
spec:
  storageClassName: trident-csi-fsx
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    name: clone
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: trident-csi-fsx
  resources:
    requests:
      storage: 1Gi
  dataSource:
    kind: PersistentVolumeClaim
    name: simple-pvc

Expected behavior
Ephemeral snapshot removed

Additional context
Snapshot remains:

FsxIdxxxxxxx::> vol snapshot show
                                                                 ---Blocks---
Vserver  Volume   Snapshot                                  Size Total% Used%
-------- -------- ------------------------------------- -------- ------ -----
svm1     svm1_root
                  hourly...
         trident_pvc_3df42396_be95_4f94_a7f9_f97bb39c5659
                  20240418T141852Z                         176KB     0%   36%
...
@akalenyu akalenyu added the bug label Apr 18, 2024
@akalenyu
Copy link
Author

/cc @aglitke @clintonk
This becomes an issue with OpenShift Virt where we use the concept of golden images;
most VMs are created from a single golden OS flavor image
But could also be a problem when mass cloning VM disks from a preset VM.

akalenyu added a commit to akalenyu/containerized-data-importer that referenced this issue Apr 18, 2024
There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
@wonderland
Copy link

This is in part related to how FSxN (or Ontap in general) works. By default, the source volume and its clone will have a connection between them, making the clone faster and more space efficient. So while both PVCs are independent objects in K8s, they do not necessarily have to be on the storage side. However, Trident gives you control over this.

I see two main options to deal with this:

Create a "golden snapshot"

With this approach, you not only have a golden image - which still is a read-write volume and therefore could potentially change without you knowing. You also create a "golden snapshot" from that golden image PVC. That freezes the current PVC state, making it immutable so you always clone off the exact same state. In addition, you only have exactly one snapshot and clone off the snapshot rather than the mutable PVC (e.g. Trident won't create an additional snapshot in this case). Potentially you can have more snapshots in the future as you can also use this for "versioning" of the golden image, e.g. make necessary changes to the PVC, then snapshot again. For each clone you can then either use the old snapshot, resembling the previous golden image state, or the new snapshot, resembling the modified state. Therefore my preferred approach.

You create a golden snap like this:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot 
metadata: 
  name: golden-snap1
spec: 
  volumeSnapshotClassName: ontap-snaps 
  source: 
    persistentVolumeClaimName: simple-pvc

Then create a clone from that golden snapshot like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    name: clone
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: sc-nas-svm1
  resources:
    requests:
      storage: 1Gi
  dataSource:
    kind: VolumeSnapshot
    name: golden-snap1
    apiGroup: snapshot.storage.k8s.io

Instruct Trident to "split" the clone

Trident will keep the relationship between source and clone intact by default - resulting in the extra snapshot you noted. However, you can change this behavior with either an annotation on the source PVC or by setting the respective option in your Trident backend configuration. When setting splitOnClone to false, Trident will fully decouple source and clone volume, leaving no extra snapshot behind. As an example, setting this annotation on your golden PVC would look like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test3
  annotations:
      trident.netapp.io/splitOnClone: "true"
spec:
  storageClassName: sc-nas-svm1
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

akalenyu added a commit to akalenyu/containerized-data-importer that referenced this issue Apr 30, 2024
There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
kubevirt-bot pushed a commit to kubevirt/containerized-data-importer that referenced this issue Apr 30, 2024
There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
akalenyu added a commit to akalenyu/containerized-data-importer that referenced this issue May 1, 2024
…evirt#3209)

There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
akalenyu added a commit to akalenyu/containerized-data-importer that referenced this issue May 2, 2024
There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
akalenyu added a commit to akalenyu/containerized-data-importer that referenced this issue May 2, 2024
There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
kubevirt-bot pushed a commit to kubevirt/containerized-data-importer that referenced this issue May 2, 2024
)

* Update portworx csi to have CSI-clone clone strategy (#3085)

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Change clone strategy and cron sources format for LVMS

Following some investigation we are now confident this tuning is benefical for LVMS:
- CSI clone is effectively implemented the same as snapshotting because both already use lvm2 thinly provisioned snapshots under the hood
- Snapshots being COW, it makes total sense to store cron sources in the snapshot format instead of PVC:
```
A snapshot of a thin logical volume also creates a thin logical volume.
This consumes no data space until a COW operation is required, or until the snapshot itself is written.
```

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Set default clone strategy & cron format for trident to snapshot

There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Shelly Kagan <skagan@redhat.com>
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Co-authored-by: Shelly Kagan <skagan@redhat.com>
kubevirt-bot pushed a commit to kubevirt/containerized-data-importer that referenced this issue May 2, 2024
)

* Update portworx csi to have CSI-clone clone strategy (#3085)

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Change clone strategy and cron sources format for LVMS

Following some investigation we are now confident this tuning is benefical for LVMS:
- CSI clone is effectively implemented the same as snapshotting because both already use lvm2 thinly provisioned snapshots under the hood
- Snapshots being COW, it makes total sense to store cron sources in the snapshot format instead of PVC:
```
A snapshot of a thin logical volume also creates a thin logical volume.
This consumes no data space until a COW operation is required, or until the snapshot itself is written.
```

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Set default clone strategy & cron format for trident to snapshot

There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Shelly Kagan <skagan@redhat.com>
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Co-authored-by: Shelly Kagan <skagan@redhat.com>
@aglitke
Copy link

aglitke commented May 3, 2024

@wonderland Thanks for providing these details. The PR that @akalenyu referenced here changes CDI behavior for trident serviced storage classes to do just as you suggest. We agree that an immutable golden image is desirable for its own merits. Without your trident-specific annotation on the cloned PVC will we still encounter problems if the golden image snapshot is deleted? The golden images are updated periodically by an automated process and we employ garbage collection of older versions in order to better manage our storage usage. In this case there could still be VMs which were cloned from a golden image snapshot that is a garbage collection candidate.

We would like to avoid using vendor-specific annotations on the PVCs we create on behalf of the user.

kubevirt-bot added a commit to kubevirt/containerized-data-importer that referenced this issue May 3, 2024
…intenance (#3227)

* Double number of e2e parallel nodes from 3 to 6 (#3156)

We have some feedback from different platform testing that this
can work tests (not resource) wise. Let's give it a try here as well

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Support IPv6 in controller GetMetricsURL (#3165)

Joining hostname+port with string concatenation leads to wrong URLs
when the hostame is an IPv6:

HOST    PORT   Sprintf    CORRECT
::1     8000   ::1:8000   [::1]:8000

To avoid this issue, it's best to use net.JoinHostPort. I added a test
that verifies this fix.

This type of issue can be discovered running the following command:

    golangci-lint run ./... --enable nosprintfhostport

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable misspell linter and fix spelling errors (#3164)

* Add misspell to list of linters

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Fix spelling errors

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Remove openshift dep on api (#3172)

Less deps the better!  Could be problematic for projects importing CDI and openshift.

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* Enable unconvert linter (#3171)

* Enable unconvert linter

This linter's doc describes it as:

   The unconvert program analyzes Go packages to identify unnecessary
   type conversions; i.e., expressions T(x) where x already has type T.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Unrestrict the number of linter warnings

It is best to show all warnings at once than to reveal them piece-meal,
particularly in CI where the feedback loop can be a bit slow.

By default, linters may only print the same message three times
(https://golangci-lint.run/usage/configuration/#issues-configuration)
The unconvert linter always prints the same message, so it specially
affected by this setting.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Remove redundant type conversions

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable linters checking error handling (#3177)

* Enable errorlint linter

Per the readme:

> go-errorlint is a source code linter for Go software that can be used
to find code that will cause problems with the error wrapping scheme
introduced in Go 1.13.

https://github.com/polyfloyd/go-errorlint

Essentially, it checks that you properly use errors.Is and errors.As

It also forces you to use %w, but that is a bit too much. Using %v is
fine when you don't want to leak the internal error types of the
package. This is why I disabled this setting.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Replace error comparisons with errors.Is

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Replace error type assertions with errors.As

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable errname linter

Per the readme:
> Checks that sentinel errors are prefixed with the Err and error
> types are suffixed with the Error.

https://github.com/Antonboom/errname

Not a big deal, but helps keep the naming consistent.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Fix errname linter warnings

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Run bazelisk run //robots/cmd/uploader:uploader -- -workspace /home/prow/go/src/github.com/kubevirt/project-infra/../containerized-data-importer/WORKSPACE -dry-run=false (#3199)

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>

* CVE 2024-24786 fix: Bump google.golang.org/protobuf from 1.31.0 to 1.33.0 (#3195)

* Upgrade google.golang.org/protobuf

This solves CVE-2024-24786

https://www.cve.org/CVERecord?id=CVE-2024-24786

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Update checksum and vendoring

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable durationcheck linter (#3176)

* Enable durationcheck linter

> Check for two durations multiplied together.

https://github.com/charithe/durationcheck

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Fix misuse of time.Duration

I also had to rename the variable because go-statickcheck complains
about a time.Duration having the suffix 'Sec'.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable nakedret linter (#3178)

* Enable nakedret linter

> Checks that functions with naked returns are not longer than a maximum
> size (can be zero).

https://github.com/alexkohler/nakedret

Relevant from the GO style guide:
> Naked returns are okay if the function is a handful of lines. Once
> it’s a medium sized function, be explicit with your return values.

https://go.dev/wiki/CodeReviewComments#named-result-parameters

I think a naked return is never easier to read than a regular return,
so I set the linter to always complain about it. Disagreements are
welcome.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Refactor functions with naked returns

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Clean up leftover snapshot sources from DataImportCron tests (#3123)

* Clean up leftover snapshot sources from DataImportCron tests

The dataimportcron tests may affect existing crons in the environment (in addition to the ones deployed by testing)
so we are better off cleaning up after ourselves.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Add watch for volumesnapshot delete

Although we don't support seamless transition from volumesnapshot->pvc sources
(we hope you stay on snapshots if they scale better for your storage)
this will still be needed in most cases when someone switches and manually deletes snapshots.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Enable autoformatting linters (#3179)

* Enable gofmt linter

From the docs:

> Gofmt checks whether code was gofmt-ed. By default this tool runs with
> -s option to check for code simplification.

https://golangci-lint.run/usage/linters/#gofmt

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Run gomft on the project

Ran this command after adding the gofmt linter:

	golangci-lint run ./... --fix

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable whitespace linter

From the docs:
> Whitespace is a linter that checks for unnecessary newlines at the
> start and end of functions, if, for, etc.

https://golangci-lint.run/usage/linters/#whitespace

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Run whitespace on the project

Ran this command after adding the whitespace linter:

	golangci-lint run ./... --fix

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable GCI linter

Per the docs:

> Gci controls Go package import order and makes it always deterministic.

https://golangci-lint.run/usage/linters/#gci

NOTE: I noticed that many files separate their imports in a particular
way, so I set the linter to enforce this standard.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Run GCI on the project

Ran this command after adding the GCI linter:

	golangci-lint run ./... --fix

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable dupword linter (#3175)

* Enable dupword linter

Per the readme:
> A linter that checks for duplicate words in the source code (usually miswritten)

https://github.com/Abirdcfly/dupword
https://golangci-lint.run/usage/linters/#dupword

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Remove duplicate words in comments and strings

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Change clone strategy and cron sources format for LVMS (#3196)

Following some investigation we are now confident this tuning is benefical for LVMS:
- CSI clone is effectively implemented the same as snapshotting because both already use lvm2 thinly provisioned snapshots under the hood
- Snapshots being COW, it makes total sense to store cron sources in the snapshot format instead of PVC:
```
A snapshot of a thin logical volume also creates a thin logical volume.
This consumes no data space until a COW operation is required, or until the snapshot itself is written.
```

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Set default clone strategy & cron format for trident to snapshot (#3209)

There's a bug in the trident CSI driver that causes a snapshot
to be left over following each CSI clone:
NetApp/trident#901
This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image):
https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
Co-authored-by: Edu Gómez Escandell <edu1997xyz@gmail.com>
Co-authored-by: Edu Gómez Escandell <egomez@redhat.com>
Co-authored-by: Michael Henriksen <mhenriks@redhat.com>
Co-authored-by: kubevirt-bot <kubevirtbot@redhat.com>
@sjpeeris sjpeeris added bug and removed bug labels Jul 17, 2024
@clintonk
Copy link
Contributor

clintonk commented Aug 8, 2024

This issue is fixed in cc6b280 and will be in Trident 24.10.

@clintonk clintonk closed this as completed Aug 8, 2024
@akalenyu
Copy link
Author

akalenyu commented Aug 8, 2024

This issue is fixed in cc6b280 and will be in Trident 24.10.

Cool, do the recommendations about golden snapshots & cloning via an ephemeral k8s volumesnapshot still stand?
Or would you advise we follow up (possibly revert) some of kubevirt/containerized-data-importer#3209?

/cc @aglitke

@clintonk
Copy link
Contributor

clintonk commented Aug 8, 2024

Cool, do the recommendations about golden snapshots & cloning via an ephemeral k8s volumesnapshot still stand?

Trident only creates an automatic snapshot during cloning because ONTAP needs that. If you create the snapshot, then you can manage its lifecycle as needed. That still seems preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants