ONTAP ephemeral snapshot remains in backend when doing PVC->PVC CSI clone #901

akalenyu · 2024-04-18T14:31:20Z

Describe the bug
A CSI clone works by creating an ephemeral snapshot behind the scenes.
This snapshot is not cleaned up potentially resulting in hitting the snapshot limit for FSx:
https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/snapshots-ontap.html (1023 per volume)

Environment
Provide accurate information about the environment to help us reproduce the issue.

Trident version: v24.02.0
Trident installation flags used: [e.g. -d -n trident --use-custom-yaml]
Container runtime: [e.g. Docker 19.03.1-CE]
Kubernetes version: [e.g. 1.15.1]
Kubernetes orchestrator: OpenShift 4.15
Kubernetes enabled feature gates: [e.g. CSINodeInfo]
OS: [e.g. RHEL 7.6, Ubuntu 16.04]
NetApp backend types: ONTAP
Other:

To Reproduce

Create PVC
Clone the above PVC

kind: PersistentVolumeClaim
metadata:
  name: simple-pvc
spec:
  storageClassName: trident-csi-fsx
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    name: clone
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: trident-csi-fsx
  resources:
    requests:
      storage: 1Gi
  dataSource:
    kind: PersistentVolumeClaim
    name: simple-pvc

Expected behavior
Ephemeral snapshot removed

Additional context
Snapshot remains:

FsxIdxxxxxxx::> vol snapshot show
                                                                 ---Blocks---
Vserver  Volume   Snapshot                                  Size Total% Used%
-------- -------- ------------------------------------- -------- ------ -----
svm1     svm1_root
                  hourly...
         trident_pvc_3df42396_be95_4f94_a7f9_f97bb39c5659
                  20240418T141852Z                         176KB     0%   36%
...

akalenyu · 2024-04-18T14:32:45Z

/cc @aglitke @clintonk
This becomes an issue with OpenShift Virt where we use the concept of golden images;
most VMs are created from a single golden OS flavor image
But could also be a problem when mass cloning VM disks from a preset VM.

There's a bug in the trident CSI driver that causes a snapshot to be left over following each CSI clone: NetApp/trident#901 This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image): https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

wonderland · 2024-04-22T15:52:01Z

This is in part related to how FSxN (or Ontap in general) works. By default, the source volume and its clone will have a connection between them, making the clone faster and more space efficient. So while both PVCs are independent objects in K8s, they do not necessarily have to be on the storage side. However, Trident gives you control over this.

I see two main options to deal with this:

Create a "golden snapshot"

With this approach, you not only have a golden image - which still is a read-write volume and therefore could potentially change without you knowing. You also create a "golden snapshot" from that golden image PVC. That freezes the current PVC state, making it immutable so you always clone off the exact same state. In addition, you only have exactly one snapshot and clone off the snapshot rather than the mutable PVC (e.g. Trident won't create an additional snapshot in this case). Potentially you can have more snapshots in the future as you can also use this for "versioning" of the golden image, e.g. make necessary changes to the PVC, then snapshot again. For each clone you can then either use the old snapshot, resembling the previous golden image state, or the new snapshot, resembling the modified state. Therefore my preferred approach.

You create a golden snap like this:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot 
metadata: 
  name: golden-snap1
spec: 
  volumeSnapshotClassName: ontap-snaps 
  source: 
    persistentVolumeClaimName: simple-pvc

Then create a clone from that golden snapshot like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    name: clone
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: sc-nas-svm1
  resources:
    requests:
      storage: 1Gi
  dataSource:
    kind: VolumeSnapshot
    name: golden-snap1
    apiGroup: snapshot.storage.k8s.io

Instruct Trident to "split" the clone

Trident will keep the relationship between source and clone intact by default - resulting in the extra snapshot you noted. However, you can change this behavior with either an annotation on the source PVC or by setting the respective option in your Trident backend configuration. When setting splitOnClone to false, Trident will fully decouple source and clone volume, leaving no extra snapshot behind. As an example, setting this annotation on your golden PVC would look like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test3
  annotations:
      trident.netapp.io/splitOnClone: "true"
spec:
  storageClassName: sc-nas-svm1
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

There's a bug in the trident CSI driver that causes a snapshot to be left over following each CSI clone: NetApp/trident#901 This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image): https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

…evirt#3209) There's a bug in the trident CSI driver that causes a snapshot to be left over following each CSI clone: NetApp/trident#901 This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image): https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

There's a bug in the trident CSI driver that causes a snapshot to be left over following each CSI clone: NetApp/trident#901 This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image): https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

) * Update portworx csi to have CSI-clone clone strategy (#3085) Signed-off-by: Shelly Kagan <skagan@redhat.com> * Change clone strategy and cron sources format for LVMS Following some investigation we are now confident this tuning is benefical for LVMS: - CSI clone is effectively implemented the same as snapshotting because both already use lvm2 thinly provisioned snapshots under the hood - Snapshots being COW, it makes total sense to store cron sources in the snapshot format instead of PVC: ``` A snapshot of a thin logical volume also creates a thin logical volume. This consumes no data space until a COW operation is required, or until the snapshot itself is written. ``` Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Set default clone strategy & cron format for trident to snapshot There's a bug in the trident CSI driver that causes a snapshot to be left over following each CSI clone: NetApp/trident#901 This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image): https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> --------- Signed-off-by: Shelly Kagan <skagan@redhat.com> Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> Co-authored-by: Shelly Kagan <skagan@redhat.com>

aglitke · 2024-05-03T17:20:32Z

@wonderland Thanks for providing these details. The PR that @akalenyu referenced here changes CDI behavior for trident serviced storage classes to do just as you suggest. We agree that an immutable golden image is desirable for its own merits. Without your trident-specific annotation on the cloned PVC will we still encounter problems if the golden image snapshot is deleted? The golden images are updated periodically by an automated process and we employ garbage collection of older versions in order to better manage our storage usage. In this case there could still be VMs which were cloned from a golden image snapshot that is a garbage collection candidate.

We would like to avoid using vendor-specific annotations on the PVCs we create on behalf of the user.

…intenance (#3227) * Double number of e2e parallel nodes from 3 to 6 (#3156) We have some feedback from different platform testing that this can work tests (not resource) wise. Let's give it a try here as well Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Support IPv6 in controller GetMetricsURL (#3165) Joining hostname+port with string concatenation leads to wrong URLs when the hostame is an IPv6: HOST PORT Sprintf CORRECT ::1 8000 ::1:8000 [::1]:8000 To avoid this issue, it's best to use net.JoinHostPort. I added a test that verifies this fix. This type of issue can be discovered running the following command: golangci-lint run ./... --enable nosprintfhostport Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Enable misspell linter and fix spelling errors (#3164) * Add misspell to list of linters Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Fix spelling errors Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> --------- Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Remove openshift dep on api (#3172) Less deps the better! Could be problematic for projects importing CDI and openshift. Signed-off-by: Michael Henriksen <mhenriks@redhat.com> * Enable unconvert linter (#3171) * Enable unconvert linter This linter's doc describes it as: The unconvert program analyzes Go packages to identify unnecessary type conversions; i.e., expressions T(x) where x already has type T. Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Unrestrict the number of linter warnings It is best to show all warnings at once than to reveal them piece-meal, particularly in CI where the feedback loop can be a bit slow. By default, linters may only print the same message three times (https://golangci-lint.run/usage/configuration/#issues-configuration) The unconvert linter always prints the same message, so it specially affected by this setting. Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Remove redundant type conversions Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> --------- Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Enable linters checking error handling (#3177) * Enable errorlint linter Per the readme: > go-errorlint is a source code linter for Go software that can be used to find code that will cause problems with the error wrapping scheme introduced in Go 1.13. https://github.com/polyfloyd/go-errorlint Essentially, it checks that you properly use errors.Is and errors.As It also forces you to use %w, but that is a bit too much. Using %v is fine when you don't want to leak the internal error types of the package. This is why I disabled this setting. Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Replace error comparisons with errors.Is Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Replace error type assertions with errors.As Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Enable errname linter Per the readme: > Checks that sentinel errors are prefixed with the Err and error > types are suffixed with the Error. https://github.com/Antonboom/errname Not a big deal, but helps keep the naming consistent. Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Fix errname linter warnings Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> --------- Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Run bazelisk run //robots/cmd/uploader:uploader -- -workspace /home/prow/go/src/github.com/kubevirt/project-infra/../containerized-data-importer/WORKSPACE -dry-run=false (#3199) Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com> * CVE 2024-24786 fix: Bump google.golang.org/protobuf from 1.31.0 to 1.33.0 (#3195) * Upgrade google.golang.org/protobuf This solves CVE-2024-24786 https://www.cve.org/CVERecord?id=CVE-2024-24786 Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Update checksum and vendoring Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> --------- Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Enable durationcheck linter (#3176) * Enable durationcheck linter > Check for two durations multiplied together. https://github.com/charithe/durationcheck Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Fix misuse of time.Duration I also had to rename the variable because go-statickcheck complains about a time.Duration having the suffix 'Sec'. Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> --------- Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Enable nakedret linter (#3178) * Enable nakedret linter > Checks that functions with naked returns are not longer than a maximum > size (can be zero). https://github.com/alexkohler/nakedret Relevant from the GO style guide: > Naked returns are okay if the function is a handful of lines. Once > it’s a medium sized function, be explicit with your return values. https://go.dev/wiki/CodeReviewComments#named-result-parameters I think a naked return is never easier to read than a regular return, so I set the linter to always complain about it. Disagreements are welcome. Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Refactor functions with naked returns Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> --------- Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Clean up leftover snapshot sources from DataImportCron tests (#3123) * Clean up leftover snapshot sources from DataImportCron tests The dataimportcron tests may affect existing crons in the environment (in addition to the ones deployed by testing) so we are better off cleaning up after ourselves. Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Add watch for volumesnapshot delete Although we don't support seamless transition from volumesnapshot->pvc sources (we hope you stay on snapshots if they scale better for your storage) this will still be needed in most cases when someone switches and manually deletes snapshots. Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> --------- Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Enable autoformatting linters (#3179) * Enable gofmt linter From the docs: > Gofmt checks whether code was gofmt-ed. By default this tool runs with > -s option to check for code simplification. https://golangci-lint.run/usage/linters/#gofmt Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Run gomft on the project Ran this command after adding the gofmt linter: golangci-lint run ./... --fix Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Enable whitespace linter From the docs: > Whitespace is a linter that checks for unnecessary newlines at the > start and end of functions, if, for, etc. https://golangci-lint.run/usage/linters/#whitespace Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Run whitespace on the project Ran this command after adding the whitespace linter: golangci-lint run ./... --fix Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Enable GCI linter Per the docs: > Gci controls Go package import order and makes it always deterministic. https://golangci-lint.run/usage/linters/#gci NOTE: I noticed that many files separate their imports in a particular way, so I set the linter to enforce this standard. Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Run GCI on the project Ran this command after adding the GCI linter: golangci-lint run ./... --fix Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> --------- Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Enable dupword linter (#3175) * Enable dupword linter Per the readme: > A linter that checks for duplicate words in the source code (usually miswritten) https://github.com/Abirdcfly/dupword https://golangci-lint.run/usage/linters/#dupword Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Remove duplicate words in comments and strings Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> --------- Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> * Change clone strategy and cron sources format for LVMS (#3196) Following some investigation we are now confident this tuning is benefical for LVMS: - CSI clone is effectively implemented the same as snapshotting because both already use lvm2 thinly provisioned snapshots under the hood - Snapshots being COW, it makes total sense to store cron sources in the snapshot format instead of PVC: ``` A snapshot of a thin logical volume also creates a thin logical volume. This consumes no data space until a COW operation is required, or until the snapshot itself is written. ``` Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Set default clone strategy & cron format for trident to snapshot (#3209) There's a bug in the trident CSI driver that causes a snapshot to be left over following each CSI clone: NetApp/trident#901 This is becoming a problem rapidly once one hits the snapshot limit per volume (golden image): https://kb.netapp.com/onprem/ontap/os/Maximum_number_of_snapshots_supported_by_ONTAP Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> --------- Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> Signed-off-by: Edu Gómez Escandell <egomez@redhat.com> Signed-off-by: Michael Henriksen <mhenriks@redhat.com> Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com> Co-authored-by: Edu Gómez Escandell <edu1997xyz@gmail.com> Co-authored-by: Edu Gómez Escandell <egomez@redhat.com> Co-authored-by: Michael Henriksen <mhenriks@redhat.com> Co-authored-by: kubevirt-bot <kubevirtbot@redhat.com>

clintonk · 2024-08-08T15:13:50Z

This issue is fixed in cc6b280 and will be in Trident 24.10.

akalenyu · 2024-08-08T15:52:48Z

This issue is fixed in cc6b280 and will be in Trident 24.10.

Cool, do the recommendations about golden snapshots & cloning via an ephemeral k8s volumesnapshot still stand?
Or would you advise we follow up (possibly revert) some of kubevirt/containerized-data-importer#3209?

/cc @aglitke

clintonk · 2024-08-08T17:03:14Z

Cool, do the recommendations about golden snapshots & cloning via an ephemeral k8s volumesnapshot still stand?

Trident only creates an automatic snapshot during cloning because ONTAP needs that. If you create the snapshot, then you can manage its lifecycle as needed. That still seems preferable.

akalenyu added the bug label Apr 18, 2024

akalenyu mentioned this issue Apr 18, 2024

Set default clone strategy for trident to snapshot kubevirt/containerized-data-importer#3209

Merged

sjpeeris added bug and removed bug labels Jul 17, 2024

clintonk closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONTAP ephemeral snapshot remains in backend when doing PVC->PVC CSI clone #901

ONTAP ephemeral snapshot remains in backend when doing PVC->PVC CSI clone #901

akalenyu commented Apr 18, 2024

akalenyu commented Apr 18, 2024

wonderland commented Apr 22, 2024

aglitke commented May 3, 2024

clintonk commented Aug 8, 2024

akalenyu commented Aug 8, 2024

clintonk commented Aug 8, 2024