Backport of CSI: volume watcher shutdown fixes into release/1.2.x #12698

hc-github-team-nomad-core · 2022-04-19T23:32:35Z

Backport

This PR is auto-generated from #12439 to be assessed for backporting due to the inclusion of the label backport/1.2.x.

WARNING automatic cherry-pick of commits failed. Commits will require human attention.

The below text is copied from the body of the original PR.

The volume watcher design was based on deploymentwatcher and drainer,
but has an important difference: we don't want to maintain a goroutine
for the lifetime of the volume. So we stop the volumewatcher goroutine
for a volume when that volume has no more claims to free. But the
shutdown races with updates on the parent goroutine, and it's possible
to drop updates. Fortunately these updates are picked up on the next
core GC job, but we're most likely to hit this race when we're
replacing an allocation and that's the time we least want to wait.

Wait until the volume has "settled" before stopping this goroutine so
that the race between shutdown and the parent goroutine sending on
<-updateCh is pushed to after the window we most care about quick
freeing of claims.

Fixes a resource leak when volumewatchers are no longer needed. The
volume is nil and can't ever be started again, so the volume's
watcher should be removed from the top-level Watcher.
De-flakes the GC job test: the test throws an error because the
claimed node doesn't exist and is unreachable. This flaked instead of
failed because we didn't correctly wait for the first pass through the
volumewatcher.

Make the GC job wait for the volumewatcher to reach the quiescent
timeout window state before running the GC eval under test, so that
we're sure the GC job's work isn't being picked up by processing one
of the earlier claims. Update the claims used so that we're sure the
GC pass won't hit a node unpublish error.
Adds trace logging to unpublish operations

lgfa29 · 2022-04-20T00:21:59Z

Manually backported

github-actions · 2022-10-16T02:47:25Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added 3 commits April 1, 2022 20:56

backport of commit ec0bb64

59f613d

backport of commit 9662260

ab4ab3e

backport of commit af40504

06f630e

hc-github-team-nomad-core force-pushed the backport/b-csi-core-job-gc/surely-coherent-llama branch from 6ca2638 to 06f630e Compare April 19, 2022 23:32

hc-github-team-nomad-core requested a review from tgross April 19, 2022 23:32

lgfa29 closed this Apr 20, 2022

github-actions bot locked as resolved and limited conversation to collaborators Oct 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of CSI: volume watcher shutdown fixes into release/1.2.x #12698

Backport of CSI: volume watcher shutdown fixes into release/1.2.x #12698

hc-github-team-nomad-core commented Apr 19, 2022

lgfa29 commented Apr 20, 2022

github-actions bot commented Oct 16, 2022

Backport of CSI: volume watcher shutdown fixes into release/1.2.x #12698

Backport of CSI: volume watcher shutdown fixes into release/1.2.x #12698

Conversation

hc-github-team-nomad-core commented Apr 19, 2022

Backport

lgfa29 commented Apr 20, 2022

github-actions bot commented Oct 16, 2022