Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(seaweedfs-csi-driver): delete unnecessary stage #151

Merged
merged 1 commit into from
Jan 12, 2024
Merged

chore(seaweedfs-csi-driver): delete unnecessary stage #151

merged 1 commit into from
Jan 12, 2024

Conversation

duanhongyi
Copy link
Contributor

NodeStageVolume is an additional step for the block devices, it's used to partition and format the disk and mount the disk on a node global directory.The mount of weed uses fuse, similar to NFS, not a block device, so you don't need to use NodeStageVolume,

At the same time, this change is also to prepare for future use of requiresRepublish, which means there is a mechanism to restore the pod's mount when nodeserver error, without the need to restart the pod.

Ceph nfs did not use NodeStageVolume(Non block devices):
https://github.com/ceph/ceph-csi/blob/devel/internal/nfs/nodeserver/nodeserver.go#L142

Ceph fs uses NodeStageVolume(block device):
https://github.com/ceph/ceph-csi/blob/devel/internal/cephfs/nodeserver.go#L158

@chrislusf
Copy link
Contributor

Thanks and good to learn!

@chrislusf chrislusf merged commit 44283c0 into seaweedfs:master Jan 12, 2024
1 of 4 checks passed
@kvaster
Copy link
Contributor

kvaster commented Jan 16, 2024

And what about caches ? i.e. if some volume will be mounted in several pods on the same node ? Before this PR only one process will be used for this, but now each mount will use it's own process for this.

@kvaster
Copy link
Contributor

kvaster commented Jan 16, 2024

Also requiresRepublish is not a mechanism for restoring the pod's mount.
See: https://kubernetes-csi.github.io/docs/csi-driver-object.html
Especially: CSI drivers should only atomically update the contents of the volume. Mount point change will not be seen by a running container.

The problem is that we really need to propagate new mount point into the running container and currently there is no any mechanism for this.

@kvaster
Copy link
Contributor

kvaster commented Jan 16, 2024

@duanhongyi, and it's definitely wrong assuming that NodeStageVolume is used just for block devices.

NodeStageVolume – Mounts a persistent volume to a global path on a worker node.
NodePublishVolume – Mounts a persistent volume from the global path on a worker node to a pod.

From here: https://arslan.io/2018/06/21/how-to-write-a-container-storage-interface-csi-plugin/

NodeStageVolume: This method is called by the CO to temporarily mount the volume to a staging path. Usually this staging path is a global directory on the node. In Kubernetes, after it's mounted to the global directory, you mount it into the pod directory (via NodePublishVolume). The reason that mounting is a two step operation is because Kubernetes allows you to use a single volume by multiple pods. This is allowed when the storage system supports it (say NFS) or if all pods run on the same node. One thing to note is that you also need to format the volume if it's not formatted already. Keep that in mind.

NodePublishVolume: This method is called to mount the volume from staging to target path. Usually what you do here is a bind mount. A bind mount allows you to mount a path to a different path (instead of mounting a device to a path). In Kubernetes, this allows us for example to use the mounted volume from the staging path (i.e global directory) to the target path (pod directory). Here, formatting is not needed because we already did it in NodeStageVolume.

As a result NodeStageVolume is a wonderfull mechanism for ReadWriteMany volumes of any type. It allows to share mount many times on the same node.

@chrislusf
Copy link
Contributor

@kvaster , seems @duanhongyi is in a different time zone. Let's revert this PR?

@chrislusf
Copy link
Contributor

#153 is after this PR.

To revert this PR, #153 also need to be reverted.

@kvaster
Copy link
Contributor

kvaster commented Jan 16, 2024

@chrislusf , my proposition is to wait for @duanhongyi answer. My personal position is to respect opinion and work of others. And it will be wonderfull if we all agree about the way to go.

@kvaster
Copy link
Contributor

kvaster commented Jan 18, 2024

@duanhongyi, any comments ? If there are no any comments, my proposition is to revert this two PRs today.

@duanhongyi
Copy link
Contributor Author

Sorry, I've been quite busy lately and don't have time for discussions; The original intention of mentioning this PR is to simplify the mounting process of CSI. In my opinion, it is unnecessary to share a mounting point among multiple nodes in one node. Instead, it increases the complexity of the code. Perhaps as you said, cache sharing is possible, but I think it is not necessary compared to the increase in complexity it brings.

@kvaster @chrislusf

@kvaster
Copy link
Contributor

kvaster commented Jan 18, 2024

@duanhongyi , thanks for your answer! This is all about resources, not only cache. For use it saves lot's of resources (including network) when mount process is shared across many small pods on the node. I don't think that this brings too much complexity. But it brings flexibility and options without any other downside (except a little bit more complex code). Also it will not solve any of the problems mentioned - i.e remounting volume in case of upgrade or process crash.

@duanhongyi
Copy link
Contributor Author

@duanhongyi , thanks for your answer! This is all about resources, not only cache. For use it saves lot's of resources (including network) when mount process is shared across many small pods on the node. I don't think that this brings too much complexity. But it brings flexibility and options without any other downside (except a little bit more complex code). Also it will not solve any of the problems mentioned - i.e remounting volume in case of upgrade or process crash.

Okay, you convinced me. The last two submissions can be rolled back.

@chrislusf
Copy link
Contributor

ok. Reverted last two PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants