CSI: node unmount from the client before unpublish RPC #11892

tgross · 2022-01-20T20:53:46Z

Partial fix for #10927 #10052 #10833
Requires #11890 #11932

When an allocation stops, the csi_hook makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and stop_on_client_disconnect
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.

jrasell

I have an inline question, but I sense this is a knowledge gap. Therefore 👍🏻

client/allocrunner/csi_hook.go

jrasell · 2022-01-27T09:36:03Z

client/allocrunner/csi_hook.go

+		if err != nil {
+			mErr = multierror.Append(mErr, err)
+		}


Apologies if this was covered yesterday, but for my education, if we fail to unmount the volume, can attempting to unpublish the volume succeed or will it always fail? I am curious how the fall-through when an error is received works.

Good catch. There are two cases:

We may be returning an error that's structs.ErrCSIClientRPCIgnorable. For example, this CSI RPC is supposed to be idempotent so if it's been node-unpublished already we still want to make the CSIVolume.Unpublish RPC back to the server so that the server can do a controller-unpublish and release the claim.

If it's not ignorable (ex. the node plugin's alloc is gone), we should probably not send the CSIVolume.Unpublish because there's nothing we can do from the server that will improve things! Instead we should be retrying until it works with exponential backoff, and logging that so that the operator can intervene manually. That will cause alloc shutdown to hang, which I think is exactly what we want here.

That being said, I think we should push the "ignorable" errors down into the UnmountVolume so that we only ever return non-ignorable errors. Then this code can just have a happy path and a path where errors need to be retried and block this method from returning.

@jrasell I've addressed this in 04f5b7a:

The CSI NodeUnpublishVolume/NodeUnstageVolume RPCs can return
ignorable errors in the case where the volume has already been
unmounted from the node. Handle all other errors by retrying until we
get success so as to give operators the opportunity to reschedule a
failed node plugin (ex. in the case where they accidentally drained a
node without -ignore-system). Fan-out the work for each volume into
its own goroutine so that we can release a subset of volumes if only
one is stuck.

When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first.

The CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.

jrasell

this looks awesome!

When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.

In PR #11892 we updated the `csi_hook` to unmount the volume locally via the CSI node RPCs before releasing the claim from the server. The timer for this hook was initialized with the retry time, forcing us to wait 1s before making the first unmount RPC calls. Use the new helper for timers to ensure we clean up the timer nicely.

When we unmount a volume we need to be able to recover from cases where the plugin has been shutdown before the allocation that needs it, so in #11892 we blocked shutting down the alloc runner hook. But this blocks client shutdown if we're in the middle of unmounting. The client won't be able to communicate with the plugin or send the unpublish RPC anyways, so we should cancel the context and assume that we'll resume the unmounting process when the client restarts. For `-dev` mode we don't send the graceful `Shutdown()` method and instead destroy all the allocations. In this case, we'll never be able to communicate with the plugin but also never close the context we need to prevent the hook from blocking. To fix this, move the retries into their own goroutine that doesn't block the main `Postrun`.

github-actions · 2022-10-20T02:45:39Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross self-assigned this Jan 20, 2022

tgross added the theme/storage label Jan 20, 2022

tgross mentioned this pull request Jan 20, 2022

Constraint "CSI volume has exhausted its available writer claims": 1 nodes excluded by filter #10927

Closed

tgross changed the title ~~csi: node unmount from the client before unpublish RPC~~ CSI: node unmount from the client before unpublish RPC Jan 20, 2022

tgross mentioned this pull request Jan 24, 2022

CSI node unpublish/unstage should avoid requiring server #10833

Closed

tgross added this to the 1.2.5 milestone Jan 24, 2022

tgross force-pushed the csi-client-side-node-unmount branch from c28c1bd to 28ed20d Compare January 25, 2022 16:55

vercel bot temporarily deployed to Preview – nomad January 25, 2022 16:55 Inactive

vercel bot deployed to Preview – nomad-storybook-and-ui January 25, 2022 16:55 View deployment

This was referenced Jan 25, 2022

CSI: move terminal alloc handling into denormalization #11931

Merged

CSI: resolve invalid claim states #11890

Merged

stop_after_client_disconnect causes incorrect alloc stops #11943

Open

tgross marked this pull request as ready for review January 26, 2022 20:57

tgross requested review from lgfa29, shoenig and jrasell January 26, 2022 20:57

jrasell approved these changes Jan 27, 2022

View reviewed changes

tgross force-pushed the csi-client-side-node-unmount branch from 28ed20d to 04f5b7a Compare January 27, 2022 18:58

vercel bot temporarily deployed to Preview – nomad January 27, 2022 18:58 Inactive

vercel bot deployed to Preview – nomad-storybook-and-ui January 27, 2022 18:58 View deployment

tgross force-pushed the csi-client-side-node-unmount branch from 04f5b7a to 3b7dedb Compare January 27, 2022 19:11

vercel bot temporarily deployed to Preview – nomad January 27, 2022 19:11 Inactive

vercel bot deployed to Preview – nomad-storybook-and-ui January 27, 2022 19:11 View deployment

tgross requested a review from jrasell January 27, 2022 20:30

jrasell approved these changes Jan 28, 2022

View reviewed changes

tgross merged commit 8364eda into main Jan 28, 2022

tgross deleted the csi-client-side-node-unmount branch January 28, 2022 13:30

tgross added the backport/1.0 label Jan 28, 2022

tgross added the stage/needs-backporting label Jan 28, 2022

tgross added a commit that referenced this pull request Jan 28, 2022

docs: missing changelog for #11892

a5a8c50

tgross added a commit that referenced this pull request Jan 28, 2022

docs: missing changelog for #11892 (#11959)

22bd089

tgross added a commit that referenced this pull request Jan 28, 2022

docs: missing changelog for #11892 (#11959)

57517b1

tgross added a commit that referenced this pull request Jan 28, 2022

docs: missing changelog for #11892 (#11959)

5c6aeeb

tgross added a commit that referenced this pull request Jan 28, 2022

docs: missing changelog for #11892 (#11959)

8665724

tgross removed the stage/needs-backporting label Feb 1, 2022

This was referenced Feb 3, 2022

CSI: allocrunner w/ volumes fails to restore in csi_hook after client restart #11477

Closed

csi: Task unable to run with csi_hook error "Device path not provided" #10432

Closed

tgross mentioned this pull request Feb 22, 2022

CSI: don't wait to fire initial unmount RPC #12102

Merged

This was referenced Mar 18, 2022

drainer: defer CSI plugins until last #12324

Merged

Smarter node draining handling when using CSI plugins #11614

Closed

tgross mentioned this pull request Mar 29, 2022

CSI: noisy logs for claim releasing operations #11963

Closed

tgross mentioned this pull request Apr 5, 2022

CSI: don't block client shutdown for node unmount #12457

Merged

lgfa29 removed backport/1.0 labels Apr 14, 2022

This was referenced Apr 19, 2022

Backport of CSI: don't wait to fire initial unmount RPC into release/1.2.x #12657

Merged

Backport of CSI: don't wait to fire initial unmount RPC into release/1.1.x #12658

Merged

github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSI: node unmount from the client before unpublish RPC #11892

CSI: node unmount from the client before unpublish RPC #11892

tgross commented Jan 20, 2022 •

edited

Loading

jrasell left a comment

jrasell Jan 27, 2022

tgross Jan 27, 2022

tgross Jan 27, 2022

jrasell left a comment

github-actions bot commented Oct 20, 2022

CSI: node unmount from the client before unpublish RPC #11892

CSI: node unmount from the client before unpublish RPC #11892

Conversation

tgross commented Jan 20, 2022 • edited Loading

jrasell left a comment

Choose a reason for hiding this comment

jrasell Jan 27, 2022

Choose a reason for hiding this comment

tgross Jan 27, 2022

Choose a reason for hiding this comment

tgross Jan 27, 2022

Choose a reason for hiding this comment

jrasell left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 20, 2022

tgross commented Jan 20, 2022 •

edited

Loading