-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CephFS multiple attachments to same host results in "an operation with the given Volume ID already exists" #3511
Comments
You can see the CO is sending two NodeStageRequest for the same volume at the same time (only difference is millisecond times). As per the CSI specification NodeStage should make sure the volume is mounted on the given node only once if already mounted we should return succecss https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume. To avoid consequences of mounting same volume twice we have a lock per volume at the csi driver. you will see operation already exist error until ongoing first request completes.
Same explanation as above, but here it's NodeUnstage Request. |
Thanks for your response. So it's a problem with Nomad that should have waited for the first allocation to complete before trying to mount it again? I'll open an issue there |
IMO It's more of an enhancement, not a problem. |
Eh it causes downtime during node draining where multiple tasks are being rescheduled at once |
This should fix a concurrency issue with the CSI driver ceph/ceph-csi#3511 hashicorp/nomad#15197
* Allow 1 restart per task This should fix a concurrency issue with the CSI driver ceph/ceph-csi#3511 hashicorp/nomad#15197 * expose the reschedule and restart config vars * remove unused import --------- Co-authored-by: Jorge <jorge@edn.es> Co-authored-by: Abhinav Sharma <abhi18av@outlook.com>
Describe the bug
When the container orchestrator launches multiple instances on the same host that all needs the same CephFS volume, if they are started too close toghether, the node plugin errors with
an operation with the given Volume ID already exists
. When the orchestrator tries again after some time, it then works.Since cephfs does support multi-write, it shouldn't matter that the node is still mounting the volume multiple times concurrently.
Environment details
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : fuseSteps to reproduce
Steps to reproduce the behavior:
Launch multiple instances that needs to mount the same CephFS volume on the same host. If they start too close together, some will fail to mount
Actual results
Some allocations won't mount the volume.
Expected behavior
All allocations should mount the volume.
Logs
The text was updated successfully, but these errors were encountered: