-
Notifications
You must be signed in to change notification settings - Fork 31
PersistenceVolumeClaim stuck in Pending state dispite disk having been created in vCenter. #476
Comments
@Aestel Can you please give the output for |
kubectl version is 1.9.6 Worth noting that I'm executing these command from the master node under my own user using SSL client authentication with only the systems:masters group. |
It may or may not be relevant, but when creating Persistent Volumes statically, using VSphereVolumes, we had an issue with the disk not being Detached from the host when a pod got deleted. This occurred when the volumePath in the Persistent Volumes claim did not include the .vmdk extensions. The kube control manager pod logs showed that it hadn't tried to detach the volume because the volume had already been detached suggesting the IsDiskAttached function of the cloudprovider was returning a false incorrectly. Adding the .vmdk extensions on the volumePath did provide the correct behaviour with the pod being able to move between the two nodes. |
@Aestel Can you please share the logs for Also can you share output of following commands,
which should look like
which should look like
|
@Aestel Which kubernetes version you where facing this issue? |
@abrarshivani PVC is stuck in Pending state and no PV is being created, however an underlying volume appears in the datastore. There is nothing related in the kube-controller-manager logs, and these are the only vsphere.go related entries that appear in kubelet logs:
This is the result of kubectl version:
And this is the result of kubectl get nodes (with names:
EDIT: I am running Red Hat Enterprise Linux Server release 7.4 (Maipo) on vSphere 6.0. |
@divyenpatel How can I get this tagged customer? |
@abrarshivani I got some logs by setting kube-controller-manager verbosity to 9. Please let me know how to provide them to you. Thanks. |
@Aestel In my case it turns out that this was a permissions issue. The account used on vmware did not have System.Read System.View and System.Anonymous on the vCenter object. I figured it out by trying datastore.disk.create with govc with debug enabled. @abrarshivani The error messages are very obtuse/nonexistant with regards to this issue, which makes diagnosis very difficult. Perhaps the documentation or error handing should be improved to help future users. |
@pgagnon Thanks for the pointer. I suspect it could be something similar in my case. Unfortunately in my case the vCenter is being managed by a 3rd party company and I don't have direct access to confirm if permissions are set correctly or to make any changes. |
@Aestel I am in the same boat with vmware resources being managed by another department. You can nevertheless confirm the issue with the govc command line utility with the debug flag on, using the datastore.disk.create command. It will save detailed logs of the calls to the vCenter api. In my case I saw NoPermission returned by the vmware api when the utility was trying to poll the status of the create disk task, which led to the utility never returning. |
@pgagnon Found some time to test it - the govc datastore.disk.create command hangs without providing any output. If I cancel the command I can see the disk has been created using datastore.ls
Trying to remove the disk using govc datastore.rm also hangs. Cancelling the command and doing datastore.ls shows the disk has been removed. |
@abrarshivani Apologies, I have misplaced the logs and I cannot recreate the issue as I do not have a testing vCenter available, but perhaps @Aestel could provide some? @Aestel This looks exactly like the issue I was having. At this point it would be helpful if you could post the content of ~/.govmomi/debug. Otherwise ask your vCenter operator to double-check if they have granted Read-Only permission at the vCenter level. @abrarshivani I agree that the permissions are documented properly, however what could be improved would be to better describe what happens if they are not configured as described. It is not uncommon in enterprise environments for vmware resources to be administered by a different team than the one administering k8s, and it is difficult for k8s admins to diagnose permission issues such as the one which I was experiencing since the logs are not clear with regards to what is happening. This is however perhaps something which should be handled in govmomi. |
@abrarshivani I think I found the logs. I'll get them to you via slack. |
Thanks, @pgagnon. |
One of our customer hit the same issue as reported above, they had wrong permissions in their cluster. IMO, the provisioner (or vSphere itself?) should report an error in some reasonable time instead of being blocked forever. It's trivial to add a timeout to vSphere provisioning code here: Question is, what would be the right timeout? One minute is IMO on the edge when users could get impatient. Is one minute enough of vSphere to reliably provision a volume? |
I've setup vSphere Cloud Provider in an existing Kubernetes cluster running on vSphere 6.5
I'm now trying to setup a dynamically assigned persistence volume claim following the examples.
However the persistence volume claim remains in status Pending.
I can see within vCenter that it has created the 2GB virtual disk but have been unable to find any indication on where it is stuck. The persistence volume claim shows no events.
I've checked the log files of all running pods and none of them show any related errors.
I've check journalctl and again cannot see any relevant errors.
My StorageClass yaml is:
My PersistenceVolumeClaim yaml is:
Kubernetes master and nodes all at version: v1.9.6
Kubernetes API set to version v1.8.6
The text was updated successfully, but these errors were encountered: