Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSI] csi-driver failing to provision volumes when node ID is longer than 128 #89

Closed
vpnachev opened this issue May 5, 2020 · 12 comments
Labels
kind/bug Bug platform/gcp Google cloud platform/infrastructure priority/2 Priority (lower number equals higher priority) status/external-action Issue has external dependency topology/shoot Affects Shoot clusters

Comments

@vpnachev
Copy link
Member

vpnachev commented May 5, 2020

What happened:

A persistent volume claim was failing to be provisioned, because the node ID was too long.

k -n kube-system logs -p csi-driver-node-d4tnl -c csi-node-driver-registrar
I0505 13:45:10.412325       1 main.go:110] Version: v1.3.0-0-g6e9fff3e
I0505 13:45:10.412405       1 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0505 13:45:10.412427       1 connection.go:151] Connecting to unix:///csi/csi.sock
I0505 13:45:10.412865       1 main.go:127] Calling CSI driver to discover driver name
I0505 13:45:10.412881       1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0505 13:45:10.412886       1 connection.go:181] GRPC request: {}
I0505 13:45:10.414462       1 connection.go:183] GRPC response: {"name":"pd.csi.storage.gke.io","vendor_version":"v0.7.0-gke.0"}
I0505 13:45:10.414820       1 connection.go:184] GRPC error: <nil>
I0505 13:45:10.414827       1 main.go:137] CSI driver name: "pd.csi.storage.gke.io"
I0505 13:45:10.414915       1 node_register.go:51] Starting Registration Server at: /registration/pd.csi.storage.gke.io-reg.sock
I0505 13:45:10.415103       1 node_register.go:60] Registration Server started at: /registration/pd.csi.storage.gke.io-reg.sock
I0505 13:45:10.870866       1 main.go:77] Received GetInfo call: &InfoRequest{}
I0505 13:45:11.870960       1 main.go:77] Received GetInfo call: &InfoRequest{}
I0505 13:45:13.585060       1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: error updating CSINode object with CSI driver node info: error updating CSINode: timed out waiting for the condition; caused by: CSINode.storage.k8s.io "shoot--12345678--123-56789ab-cpu-worker-z1-7c4f48599f-q6vbk" is invalid: spec.drivers[0].nodeID: Invalid value: "projects/012-34-56789abcdefghij-klmnopq/zones/us-central1-a/instances/shoot--12345678--123-56789ab-cpu-worker-z1-7c4f48599f-q6vbk": must be 128 characters or less,}
E0505 13:45:13.585115       1 main.go:89] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: error updating CSINode object with CSI driver node info: error projects/012-34-56789abcdefghij-klmnopq/zones/us-central1-a/instances/shoot--12345678--123-56789ab-cpu-worker-z1-7c4f48599f-q6vbk": must be 128 characters or less, restarting registration container.

What you expected to happen:
CSI driver to work for all machines in the clusters. I am not sure, but maybe a further restrictions on the names length has to be applied.

How to reproduce it (as minimally and precisely as possible):
Create a shoot, project and worker pool with long names. Also, the GCP project name should be long.

Anything else we need to know?:
gardener/machine-controller-manager#461

Environment:

  • Gardener version: v1.3.1
  • Kubernetes version (use kubectl version): v1.18.2
  • Cloud provider or hardware configuration:
  • Others:
@rfranzke
Copy link
Member

rfranzke commented May 5, 2020

Is this an issue specific to GCP?

@vpnachev
Copy link
Member Author

vpnachev commented May 5, 2020

I've seen it on GCP.
But the csi-node-driver-registrar is cloud agnostic, so I have to check how long are the Node IDs on other providers.

@rfranzke
Copy link
Member

rfranzke commented May 5, 2020

I suggest to move this issue to g/gep-gcp as g/g has nothing to do with it. Even the node names are not controlled by g/g but by the providers.

@ialidzhikov ialidzhikov transferred this issue from gardener/gardener May 5, 2020
@ialidzhikov
Copy link
Member

/kind/bug
/platform/gcp

@ghost ghost added kind/bug Bug platform/gcp Google cloud platform/infrastructure labels May 5, 2020
@vlerenc vlerenc added the priority/critical Needs to be resolved soon, because it impacts users negatively label Jun 24, 2020
@vlerenc
Copy link
Member

vlerenc commented Jun 24, 2020

@gardener/gardener-extension-provider-gcp-maintainers @gardener/gardener-maintainers Is there someone willing to take this up?

@vpnachev
Copy link
Member Author

After some investigation, it turns out that k8s does not allow creation of CSINode objects with len(nodeID)>128.
The gcp csi driver is responsible to provide the node ID, thus I have opened this issue on their side kubernetes-sigs/gcp-compute-persistent-disk-csi-driver#581.

/status external-action
/topology shoot

@gardener-robot gardener-robot added status/external-action Issue has external dependency topology/shoot Affects Shoot clusters labels Aug 14, 2020
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Oct 14, 2020
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Dec 13, 2020
@vpnachev
Copy link
Member Author

vpnachev commented Feb 4, 2021

Here is the PR kubernetes/kubernetes#98753 aiming reduce the likelihood this problem to occur.

@prashanth26
Copy link

Here is the PR kubernetes/kubernetes#98753 aiming reduce the likelihood this problem to occur.

This is great news :) I guess, once this PR is merged we can verify if this issue occurs in the new K8s versions and if not close this issue.

@vpnachev
Copy link
Member Author

vpnachev commented Feb 8, 2021

I think we should ask this change to be cherry-picked, otherwise shoot clusters running k8s 1.18, 1.19 or 1.20 will be still affected by this id length limitation.

@gardener-robot gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/critical Needs to be resolved soon, because it impacts users negatively labels Mar 8, 2021
@ialidzhikov ialidzhikov removed the lifecycle/rotten Nobody worked on this for 12 months (final aging stage) label Aug 30, 2021
@ialidzhikov
Copy link
Member

The answer in kubernetes-sigs/gcp-compute-persistent-disk-csi-driver#581 (comment) is pretty descriptive:

Kubernetes 1.22 increases the limit to 192 characters and Kubernetes 1.23 will allow 256 characters:
kubernetes/kubernetes#98753
kubernetes/kubernetes#101256

Closing the issue here as there are no changes required on the driver side.

I guess there is nothing to do on our side. And the issue should be resolved in Kubernetes v1.22 and v1.23.

@vpnachev
Copy link
Member Author

vpnachev commented Sep 7, 2021

Yep, we can do nothing about this issue, so I think we can close it now, wdyt?

@ialidzhikov
Copy link
Member

Yep, makes sense. Meanwhile Kubernetes v1.22 support PR was also merged. Thank you for your efforts on this issue. :)

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bug platform/gcp Google cloud platform/infrastructure priority/2 Priority (lower number equals higher priority) status/external-action Issue has external dependency topology/shoot Affects Shoot clusters
Projects
None yet
Development

No branches or pull requests

6 participants