CephFS multiple attachments to same host results in "an operation with the given Volume ID already exists" #3511

JohnKiller · 2022-11-09T16:04:20Z

Describe the bug

When the container orchestrator launches multiple instances on the same host that all needs the same CephFS volume, if they are started too close toghether, the node plugin errors with an operation with the given Volume ID already exists. When the orchestrator tries again after some time, it then works.

Since cephfs does support multi-write, it shouldn't matter that the node is still mounting the volume multiple times concurrently.

Environment details

Image/version of Ceph CSI driver : 3.7.2
Helm chart version : Nomad orchestrator
Kernel version : 5.10.0-18-amd64 SMP Debian 5.10.140-1 (2022-09-02)
Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
krbd or rbd-nbd) : fuse
Kubernetes cluster version :
Ceph cluster version : 17.2.4

Steps to reproduce

Steps to reproduce the behavior:

Launch multiple instances that needs to mount the same CephFS volume on the same host. If they start too close together, some will fail to mount

Actual results

Some allocations won't mount the volume.

Expected behavior

All allocations should mount the volume.

Logs

I1109 15:45:31.395759       1 utils.go:195] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:31.395947       1 utils.go:206] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:31.396104       1 nodeserver.go:487] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:31.396144       1 nodeserver.go:491] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:31.396205       1 utils.go:212] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:31.396939       1 utils.go:195] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:31.397069       1 utils.go:206] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:31.397244       1 utils.go:212] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.762839       1 utils.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763187       1 utils.go:206] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.769488       1 utils.go:195] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:37.769605       1 utils.go:206] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.769653       1 nodeserver.go:487] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:37.769691       1 nodeserver.go:491] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:37.769711       1 utils.go:212] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.770267       1 utils.go:195] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.772198       1 utils.go:206] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.777967       1 utils.go:195] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:37.778070       1 utils.go:206] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.778149       1 nodeserver.go:487] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:37.778194       1 nodeserver.go:491] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:37.778225       1 utils.go:212] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.778899       1 utils.go:195] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.778985       1 utils.go:206] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.802858       1 omap.go:88] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 got omap values: (pool="cephfs_metadata", namespace="csi", name="csi.volume.80def62f-30ec-11ed-901a-0242ac110003"): map[csi.imagename:csi-vol-80def62f-30ec-11ed-901a-0242ac110003 csi.volname:lagrondaia]
I1109 15:45:37.822515       1 nodeserver.go:247] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: mounting volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 with Ceph FUSE driver
I1109 15:45:37.883819       1 cephcmds.go:105] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: ceph-fuse [/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer -m 192.168.110.101,192.168.110.102,192.168.110.103 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024 -o nonempty --client_mds_namespace=cephfs]
I1109 15:45:37.883934       1 nodeserver.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.884122       1 utils.go:212] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.885413       1 utils.go:195] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodePublishVolume
I1109 15:45:37.885743       1 utils.go:206] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","target_path":"/local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.889954       1 cephcmds.go:105] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: mount [-o bind,_netdev /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer /local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer]
I1109 15:45:37.890020       1 nodeserver.go:467] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully bind-mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.890080       1 utils.go:212] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:39.780737       1 utils.go:195] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:39.780908       1 utils.go:206] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:39.780944       1 nodeserver.go:487] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:39.780957       1 nodeserver.go:491] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:39.780975       1 utils.go:212] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:46:00.241259       1 utils.go:195] ID: 18 GRPC call: /csi.v1.Identity/Probe
I1109 15:46:00.241374       1 utils.go:206] ID: 18 GRPC request: {}
I1109 15:46:00.241408       1 utils.go:212] ID: 18 GRPC response: {}
I1109 15:46:00.244573       1 utils.go:195] ID: 19 GRPC call: /csi.v1.Identity/Probe
I1109 15:46:00.244633       1 utils.go:206] ID: 19 GRPC request: {}
I1109 15:46:00.244657       1 utils.go:212] ID: 19 GRPC response: {}
I1109 15:46:00.245332       1 utils.go:195] ID: 20 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1109 15:46:00.245434       1 utils.go:206] ID: 20 GRPC request: {}
I1109 15:46:00.245561       1 utils.go:212] ID: 20 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1109 15:46:07.810929       1 utils.go:195] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodePublishVolume
I1109 15:46:07.811167       1 utils.go:206] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","target_path":"/local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:46:07.816906       1 cephcmds.go:105] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: mount [-o bind,_netdev /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer /local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer]
I1109 15:46:07.816970       1 nodeserver.go:467] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully bind-mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:46:07.817006       1 utils.go:212] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}

The text was updated successfully, but these errors were encountered:

Madhu-1 · 2022-11-10T08:46:51Z

[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "NodeStageVolume"
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 10"
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.762839       1 utils.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.802858       1 omap.go:88] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 got omap values: (pool="cephfs_metadata", namespace="csi", name="csi.volume.80def62f-30ec-11ed-901a-0242ac110003"): map[csi.imagename:csi-vol-80def62f-30ec-11ed-901a-0242ac110003 csi.volname:lagrondaia]
I1109 15:45:37.822515       1 nodeserver.go:247] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: mounting volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 with Ceph FUSE driver
I1109 15:45:37.883819       1 cephcmds.go:105] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: ceph-fuse [/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer -m 192.168.110.101,192.168.110.102,192.168.110.103 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024 -o nonempty --client_mds_namespace=cephfs]
I1109 15:45:37.883934       1 nodeserver.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.884122       1 utils.go:212] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 11"
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763187       1 utils.go:206] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists

You can see the CO is sending two NodeStageRequest for the same volume at the same time (only difference is millisecond times). As per the CSI specification NodeStage should make sure the volume is mounted on the given node only once if already mounted we should return succecss https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume. To avoid consequences of mounting same volume twice we have a lock per volume at the csi driver. you will see operation already exist error until ongoing first request completes.

[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "operation"
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 13"
I1109 15:45:37.770267       1 utils.go:195] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.772198       1 utils.go:206] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 15"
I1109 15:45:37.778899       1 utils.go:195] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.778985       1 utils.go:206] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists

Same explanation as above, but here it's NodeUnstage Request.

JohnKiller · 2022-11-10T10:12:28Z

Thanks for your response. So it's a problem with Nomad that should have waited for the first allocation to complete before trying to mount it again? I'll open an issue there

Madhu-1 · 2022-11-10T10:43:14Z

Thanks for your response. So it's a problem with Nomad that should have waited for the first allocation to complete before trying to mount it again? I'll open an issue there

IMO It's more of an enhancement, not a problem.

JohnKiller · 2022-11-10T10:51:51Z

Eh it causes downtime during node draining where multiple tasks are being rescheduled at once

This should fix a concurrency issue with the CSI driver ceph/ceph-csi#3511 hashicorp/nomad#15197

* Allow 1 restart per task This should fix a concurrency issue with the CSI driver ceph/ceph-csi#3511 hashicorp/nomad#15197 * expose the reschedule and restart config vars * remove unused import --------- Co-authored-by: Jorge <jorge@edn.es> Co-authored-by: Abhinav Sharma <abhi18av@outlook.com>

JohnKiller closed this as completed Nov 10, 2022

JohnKiller mentioned this issue Nov 10, 2022

CSI: multi-node-multi-writer fails with an operation with the given Volume ID already exists hashicorp/nomad#15197

Open

nixpanic added the dependency/nomad label Mar 21, 2024

matthdsm added a commit to nextflow-io/nf-nomad that referenced this issue Aug 28, 2024

Allow 1 restart per task

2651820

This should fix a concurrency issue with the CSI driver ceph/ceph-csi#3511 hashicorp/nomad#15197

matthdsm mentioned this issue Aug 28, 2024

Allow 1 restart per task nextflow-io/nf-nomad#82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CephFS multiple attachments to same host results in "an operation with the given Volume ID already exists" #3511

CephFS multiple attachments to same host results in "an operation with the given Volume ID already exists" #3511

JohnKiller commented Nov 9, 2022

Madhu-1 commented Nov 10, 2022

JohnKiller commented Nov 10, 2022

Madhu-1 commented Nov 10, 2022

JohnKiller commented Nov 10, 2022

CephFS multiple attachments to same host results in "an operation with the given Volume ID already exists" #3511

CephFS multiple attachments to same host results in "an operation with the given Volume ID already exists" #3511

Comments

JohnKiller commented Nov 9, 2022

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Logs

Madhu-1 commented Nov 10, 2022

JohnKiller commented Nov 10, 2022

Madhu-1 commented Nov 10, 2022

JohnKiller commented Nov 10, 2022