Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CephFS multiple attachments to same host results in "an operation with the given Volume ID already exists" #3511

Closed
JohnKiller opened this issue Nov 9, 2022 · 4 comments

Comments

@JohnKiller
Copy link

Describe the bug

When the container orchestrator launches multiple instances on the same host that all needs the same CephFS volume, if they are started too close toghether, the node plugin errors with an operation with the given Volume ID already exists. When the orchestrator tries again after some time, it then works.

Since cephfs does support multi-write, it shouldn't matter that the node is still mounting the volume multiple times concurrently.

Environment details

  • Image/version of Ceph CSI driver : 3.7.2
  • Helm chart version : Nomad orchestrator
  • Kernel version : 5.10.0-18-amd64 SMP Debian 5.10.140-1 (2022-09-02)
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) : fuse
  • Kubernetes cluster version :
  • Ceph cluster version : 17.2.4

Steps to reproduce

Steps to reproduce the behavior:

Launch multiple instances that needs to mount the same CephFS volume on the same host. If they start too close together, some will fail to mount

Actual results

Some allocations won't mount the volume.

Expected behavior

All allocations should mount the volume.

Logs

I1109 15:45:31.395759       1 utils.go:195] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:31.395947       1 utils.go:206] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:31.396104       1 nodeserver.go:487] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:31.396144       1 nodeserver.go:491] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:31.396205       1 utils.go:212] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:31.396939       1 utils.go:195] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:31.397069       1 utils.go:206] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:31.397244       1 utils.go:212] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.762839       1 utils.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763187       1 utils.go:206] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.769488       1 utils.go:195] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:37.769605       1 utils.go:206] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.769653       1 nodeserver.go:487] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:37.769691       1 nodeserver.go:491] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:37.769711       1 utils.go:212] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.770267       1 utils.go:195] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.772198       1 utils.go:206] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.777967       1 utils.go:195] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:37.778070       1 utils.go:206] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.778149       1 nodeserver.go:487] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:37.778194       1 nodeserver.go:491] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:37.778225       1 utils.go:212] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.778899       1 utils.go:195] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.778985       1 utils.go:206] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.802858       1 omap.go:88] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 got omap values: (pool="cephfs_metadata", namespace="csi", name="csi.volume.80def62f-30ec-11ed-901a-0242ac110003"): map[csi.imagename:csi-vol-80def62f-30ec-11ed-901a-0242ac110003 csi.volname:lagrondaia]
I1109 15:45:37.822515       1 nodeserver.go:247] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: mounting volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 with Ceph FUSE driver
I1109 15:45:37.883819       1 cephcmds.go:105] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: ceph-fuse [/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer -m 192.168.110.101,192.168.110.102,192.168.110.103 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024 -o nonempty --client_mds_namespace=cephfs]
I1109 15:45:37.883934       1 nodeserver.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.884122       1 utils.go:212] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.885413       1 utils.go:195] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodePublishVolume
I1109 15:45:37.885743       1 utils.go:206] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","target_path":"/local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.889954       1 cephcmds.go:105] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: mount [-o bind,_netdev /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer /local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer]
I1109 15:45:37.890020       1 nodeserver.go:467] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully bind-mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.890080       1 utils.go:212] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:39.780737       1 utils.go:195] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:39.780908       1 utils.go:206] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:39.780944       1 nodeserver.go:487] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:39.780957       1 nodeserver.go:491] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:39.780975       1 utils.go:212] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:46:00.241259       1 utils.go:195] ID: 18 GRPC call: /csi.v1.Identity/Probe
I1109 15:46:00.241374       1 utils.go:206] ID: 18 GRPC request: {}
I1109 15:46:00.241408       1 utils.go:212] ID: 18 GRPC response: {}
I1109 15:46:00.244573       1 utils.go:195] ID: 19 GRPC call: /csi.v1.Identity/Probe
I1109 15:46:00.244633       1 utils.go:206] ID: 19 GRPC request: {}
I1109 15:46:00.244657       1 utils.go:212] ID: 19 GRPC response: {}
I1109 15:46:00.245332       1 utils.go:195] ID: 20 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1109 15:46:00.245434       1 utils.go:206] ID: 20 GRPC request: {}
I1109 15:46:00.245561       1 utils.go:212] ID: 20 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1109 15:46:07.810929       1 utils.go:195] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodePublishVolume
I1109 15:46:07.811167       1 utils.go:206] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","target_path":"/local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:46:07.816906       1 cephcmds.go:105] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: mount [-o bind,_netdev /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer /local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer]
I1109 15:46:07.816970       1 nodeserver.go:467] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully bind-mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:46:07.817006       1 utils.go:212] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Nov 10, 2022

[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "NodeStageVolume"
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 10"
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.762839       1 utils.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.802858       1 omap.go:88] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 got omap values: (pool="cephfs_metadata", namespace="csi", name="csi.volume.80def62f-30ec-11ed-901a-0242ac110003"): map[csi.imagename:csi-vol-80def62f-30ec-11ed-901a-0242ac110003 csi.volname:lagrondaia]
I1109 15:45:37.822515       1 nodeserver.go:247] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: mounting volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 with Ceph FUSE driver
I1109 15:45:37.883819       1 cephcmds.go:105] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: ceph-fuse [/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer -m 192.168.110.101,192.168.110.102,192.168.110.103 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024 -o nonempty --client_mds_namespace=cephfs]
I1109 15:45:37.883934       1 nodeserver.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.884122       1 utils.go:212] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 11"
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763187       1 utils.go:206] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists

You can see the CO is sending two NodeStageRequest for the same volume at the same time (only difference is millisecond times). As per the CSI specification NodeStage should make sure the volume is mounted on the given node only once if already mounted we should return succecss https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume. To avoid consequences of mounting same volume twice we have a lock per volume at the csi driver. you will see operation already exist error until ongoing first request completes.

[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "operation"
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 13"
I1109 15:45:37.770267       1 utils.go:195] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.772198       1 utils.go:206] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 15"
I1109 15:45:37.778899       1 utils.go:195] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.778985       1 utils.go:206] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists

Same explanation as above, but here it's NodeUnstage Request.

@JohnKiller
Copy link
Author

Thanks for your response. So it's a problem with Nomad that should have waited for the first allocation to complete before trying to mount it again? I'll open an issue there

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Nov 10, 2022

Thanks for your response. So it's a problem with Nomad that should have waited for the first allocation to complete before trying to mount it again? I'll open an issue there

IMO It's more of an enhancement, not a problem.

@JohnKiller
Copy link
Author

Eh it causes downtime during node draining where multiple tasks are being rescheduled at once

matthdsm added a commit to nextflow-io/nf-nomad that referenced this issue Aug 28, 2024
This should fix a concurrency issue with the CSI driver
ceph/ceph-csi#3511
hashicorp/nomad#15197
abhi18av added a commit to nextflow-io/nf-nomad that referenced this issue Aug 28, 2024
* Allow 1 restart per task

This should fix a concurrency issue with the CSI driver
ceph/ceph-csi#3511
hashicorp/nomad#15197

* expose the reschedule and restart config vars

* remove unused import

---------

Co-authored-by: Jorge <jorge@edn.es>
Co-authored-by: Abhinav Sharma <abhi18av@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants