Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI driver will not work in default configuration with topology enabled in provisioner #2970

Open
gnufied opened this issue Jul 24, 2024 · 10 comments

Comments

@gnufied
Copy link
Contributor

gnufied commented Jul 24, 2024

After kubernetes-csi/external-provisioner#1167 is merged, topology feature is enabled by default in csi-provisioner.

Now since vsphere CSI driver by default returns topology capability - pkg/csi/service/identity.go:65 even though cluster has no topology, all volume provisioning operations will fail.

cc @divyenpatel @xing-yang @jingxu97

@gnufied
Copy link
Contributor Author

gnufied commented Jul 24, 2024

May be a solution here is to not report topology capability in clusters where no topology information is configured/available. This will allow driver to work out of box with current version of csi-provisioner. Alternatively - I have considered emitting topology information even in clusters that are single zone, but that will require quite a bit of changes and also is manual process and hence clusters will break on upgrade.

My personal preference would be a CLI flag, which can be specified while starting the driver.

@gnufied
Copy link
Contributor Author

gnufied commented Jul 24, 2024

Another thing is - disabling the topology feature in csi-provisioner is apparently not enough. With latest version of csi-provisioner, vSphere CSI driver is unable to delete intree vSphere PVs - https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_vmware-vsphere-csi-driver-operator/241/pull-ci-openshift-vmware-vsphere-csi-driver-operator-master-e2e-vsphere/1816155462064672768

@jsafrane
Copy link
Contributor

jsafrane commented Jul 26, 2024

The reason is that the CSI driver is not idempotent. When processing a migrated in-tree volume, it gets the first DeleteVolume requests and succeeds. But then the provisioner sends the same DeleteVolume request again (kubernetes-csi/external-provisioner#1235) and the CSI driver returns failure instead of success.

Sure, the provisioner should not always call DeleteVolume twice, we're going to fix it, still, it's a bug in the driver that it's not idempotent. The external-provisioner can send DeleteVolume multiple times, it's allowed in CSI and it will happen e.g. during container updates or node drains.

Moved to #2981

@sathieu
Copy link

sathieu commented Sep 26, 2024

We're hitting this.

disabling the topology feature in csi-provisioner

How to?

We are currently stuck at provisioner v4.0.1 to workaround this.

@sathieu
Copy link

sathieu commented Nov 26, 2024

Hello,

Asking again: How to disable topology feature in csi-provisioner to workaround this bug?

@gnufied @jsafrane @divyenpatel @chethanv28 ... ?

@divyenpatel
Copy link
Member

I think @nikhilbarge verified creating volume on non-topology setup using latest CSI provisioner by setting - "--feature-gates=Topology=false"

#3120 (comment)

@nikhilbarge can you confirm?

@gnufied
Copy link
Contributor Author

gnufied commented Nov 26, 2024

@shalini-b
Copy link
Collaborator

We're hitting this.

disabling the topology feature in csi-provisioner

How to?

We are currently stuck at provisioner v4.0.1 to workaround this.

We will make the changes necessary in the latest driver to make it compatible with 5.x provisioner. Can I know which version of vSphere CSI driver are you using? We have not bumped up the version of provisioner in any of the current releases, so you should not be facing the problem

@nikhilbarge
Copy link
Contributor

I think @nikhilbarge verified creating volume on non-topology setup using latest CSI provisioner by setting - "--feature-gates=Topology=false"

#3120 (comment)

@nikhilbarge can you confirm?

yes the topology future flag need to set explicitly false from provisioner version v5.0.0
but if you using v4.0.1 then you should not be facing a problem

@sathieu
Copy link

sathieu commented Nov 27, 2024

We are using:

  • registry.k8s.io/csi-vsphere/driver:v3.3.1
  • registry.k8s.io/csi-vsphere/syncer:v3.3.1
  • registry.k8s.io/sig-storage/csi-resizer:v1.12.0
  • registry.k8s.io/sig-storage/csi-attacher:v4.7.0
  • registry.k8s.io/sig-storage/livenessprobe:v2.14.0
  • registry.k8s.io/sig-storage/csi-provisioner:v4.0.1 (latest is registry.k8s.io/sig-storage/csi-provisioner:v5.1.0)
  • registry.k8s.io/sig-storage/csi-snapshotter:v8.1.0
  • registry.k8s.io/sig-storage/snapshot-validation-webhook:v8.1.0
  • registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.12.0

registry.k8s.io/sig-storage/csi-provisioner:v4.0.1 has a few CVEs including:

(those are probably not exploitable)

We will try passing "--feature-gates=Topology=false" to provisioner. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants