Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSI driver] Implementation of staged mount for vhost block and FS volumes in VM mode #1982

Merged

Conversation

aikuchin
Copy link
Contributor

@aikuchin aikuchin commented Sep 10, 2024

Previously endpoints were started on NodePublish phase, this breaks hotplug of multiple disks to the same VM (causes recreation of endpoints when helpler pod chandges, and ends up in deadlock attemptiong to create new endpoint before the old wan was stopped).

This commit splits volume mount between NodeStage and NodePublish phases. So endpoint is started in NodeStage, and NodePublish just creates mount to requested pod. With this split change of hotplug pod doesn't cause endpoint stop (because it doesn't trigger NodeUnpublish) but intead results in remounts, avoiding deadlock and reconnects.

What is unclear:

  • should stage mount be implemented for other types (FS, nbd?)
    Decided that only vhost endpoints (NBS and NFS) should use new scheme
  • how to get podId in NodeStage phase (from volume attributes?)
    We don't use podId at all, use instead instanceId from volume attributes, if it is not provided then use old scheme
  • how to make tranfer from old scheme seamless
    Volumes that have instanceId attrubute will use new scheme, others will continue using old scheme
  • should we use separate stage directory or just use s.socketsDir
    Grouped volumes in instanceId directories

Copy link
Contributor

Hi! Thank you for contributing!
The tests on this PR will run after a maintainer adds an ok-to-test label to this PR manually. Thank you for your patience!

Direct comparison can fail for wrapped errors, use errors.Is(err, target) instead.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
This way it is more obvious that they are unused and makes the linter happy.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
This is an impossible code path now but someday this can change and
otherwise function is ready to have nil mnt, so fix one more condition.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
@aikuchin aikuchin force-pushed the users/antonkuchin/csi-stage-volumes branch 2 times, most recently from 2271f27 to 6381d5d Compare September 23, 2024 05:09
@aikuchin aikuchin changed the title [RFC] [CSI driver] PoC of staged mount for vhost block volumes in VM mode [CSI driver] PoC of staged mount for vhost block volumes in VM mode Sep 23, 2024
@aikuchin aikuchin marked this pull request as ready for review September 23, 2024 05:22
@qkrorlqr qkrorlqr added ok-to-test Label to approve test launch for external members large-tests Launch large tests for PR labels Sep 23, 2024
@github-actions github-actions bot removed the ok-to-test Label to approve test launch for external members label Sep 23, 2024
@aikuchin aikuchin changed the title [CSI driver] PoC of staged mount for vhost block volumes in VM mode [CSI driver] Imlementation of staged mount for vhost block volumes in VM mode Sep 23, 2024
@aikuchin aikuchin changed the title [CSI driver] Imlementation of staged mount for vhost block volumes in VM mode [CSI driver] Implementation of staged mount for vhost block volumes in VM mode Sep 23, 2024
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 6381d5d.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
6168 6168 0 0 0 0

@aikuchin aikuchin changed the title [CSI driver] Implementation of staged mount for vhost block volumes in VM mode [CSI driver] Implementation of staged mount for vhost block and FS volumes in VM mode Sep 23, 2024
Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
Some errors are expected or don't matter, so ignore them explicitly.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
This will be required later to add staged mount for vhost endpoints

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
Volume context and capabilities should be reused in Stage and Publish
Move most of other constants from arguments to variables.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
This will be used later when part of NodePublish logic for vhost
endpoints will be moved to NodeStage.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
Before we assumed that pod ID is always present and can be used as a
part of mount path, but if endpoint is started in NodePublish podId is
not known. For such cases return staging directory name where endpoint
will be started.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
…anceId attribute

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
Also rename all socketDir variables to endpointDir for conststency.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
In most callsites we already know endpointDir, so just accept it as an
argument instead of rebuilding every time.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
This keeps apart codepaths that require podId separate from ones that rely on
instanceID.
It also allows to remove temporary hack from getInstanceOrPodId and rename
it back to getPodId.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
This way instanceId variable can be used only if it is not empty and
error handling happens only if any of stage functions was called.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
We will need these parameters later to properly stop endpoints during
NodeUnstage. This also allows to skip unnecessary calls in unstage of
old volumes.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
We had all staged disk on the node to be in shared staging directory,
but now we have all necessary information in stage record to get
instanceId in NodeUnstage so we can group disks in instance directory.

Old path looked like:
/...NBS.../sockets/${POD_ID}/${VOLUME_ID}/[nbs,nfs].sock

Intermediate path looked like:
/...NBS.../sockets/staging/${VOLUME_ID}/[nbs,nfs].sock

New path will look like:
/...NBS.../sockets/${INSTANCE_ID}/${VOLUME_ID}/[nbs,nfs].sock

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
@aikuchin aikuchin force-pushed the users/antonkuchin/csi-stage-volumes branch from aa98a8f to fc02929 Compare September 27, 2024 09:16
According to spec target directory must be removed after NodeUnstage
but now mounter mock doesn't delete it during mount point cleanup.
If directory deletion is enabled then other test breaks because it
relies on previous tests leaving temoprary directories.

This will be fixed separately form this PR.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
We use the same code for all types of volumes in NodeUnpublish because
we have no good way to get their type.
Add comment to explain why StopEndpoint calls in NodeUnpublishVolume has no
effect for staged disks.

Signed-off-by: Anton Kuchin <antonkuchin@nebius.com>
@aikuchin aikuchin force-pushed the users/antonkuchin/csi-stage-volumes branch from fc02929 to e7c8ad8 Compare September 27, 2024 09:26
@tpashkin tpashkin added blockstore Add this label to run only cloud/blockstore build and tests on PR ok-to-test Label to approve test launch for external members labels Sep 27, 2024
@github-actions github-actions bot removed the ok-to-test Label to approve test launch for external members label Sep 27, 2024
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit e7c8ad8.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
3463 3463 0 0 0 0

@qkrorlqr qkrorlqr merged commit f4ac3a9 into ydb-platform:main Sep 30, 2024
24 of 28 checks passed
@aikuchin aikuchin deleted the users/antonkuchin/csi-stage-volumes branch September 30, 2024 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blockstore Add this label to run only cloud/blockstore build and tests on PR large-tests Launch large tests for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants