Call finalizeWorkspaceContent if the workspace Pod in Terminating #11337

jenting · 2022-07-13T07:35:48Z

Description

When the node goes NotReady, the workspace Pod goes into a Terminating state.
In this case, the workspace Pod status.containerStatuses.state is still Running.

We should try to backup the workspace content if the Pod is Terminating and the underlying node is not ready or even gone.

This is the node taint if the node turns into a NotReady state.

This is the current workspace Pod spec.toleration

Therefore, for the cases

node.kubernetes.io/disk-pressure and node.kubernetes.io/memory-pressure: the workspace Pod keeps in tolerance indefinitely. -> We would not handle this case because the tolerance second is not configured.
node.kubernetes.io/network-unavailable: the workspace Pod tolerance duration is 30 seconds. -> We handle this case
node.kubernetes.io/not-ready and node.kubernetes.io/unreachable: the workspace Pod tolerance duration is 300 seconds. -> We handle this case

After the current time - the node's taint.timeAdded > the workspace pod tolerance time, the ws-manager starts back up the content.

https://www.loom.com/share/8e0e870e6bed40809d4ac8ac1159b1e2

Related Issue(s)

Fixes #11336

How to test

Create 2 nodes, 1 control plane node, 1 worker node (using the workspace-preview).
Launch a workspace, and the workspace Pod should be located on the worker node 🙏 .
SSH into the worker node, disabling the k3s-agent systemctl disable k3s-agent.
Waits for the node in NotReady state kubectl get node -w.
Waits for the workspace pod in Terminating state kubectl get pod -l component=workspace -w.
Check the workspace pod content is back up successfully.

Note: after the node removal, the terminating pod will be removed by Kubernetes after a while. (About 1 minutes)

Release Notes

Try to backup content when the node goes into the NotReady state

Documentation

None

Werft options:

/werft with-preview

components/ws-manager/pkg/manager/monitor.go

sagor999 · 2022-07-13T18:36:34Z

/hold
to prevent auto merge

…erminating state w/o backing up When the node turns into a NotReady state, after a moment, the workspace pod goes into the terminating state, but the containerStatus.state is still running. We check the pod toleration matches against the node taint, with effect NoExecute and the toleration seconds expired to make sure that the container's graceful shutdown is finished before taking the content backup. Otherwise, it might create an unstable backup. #11336 Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>

jenting · 2022-07-14T14:19:10Z

/werft run -a with-preview=true

👍 started the job as gitpod-build-jenting-10531.14
(with .werft/ from main)

jenting · 2022-07-14T14:25:35Z

/werft run with-clean-slate-deployment

👍 started the job as gitpod-build-jenting-10531.15
(with .werft/ from main)

sagor999

🚀

sagor999 · 2022-07-14T18:02:25Z

/unhold

roboquat added do-not-merge/work-in-progress release-note size/S labels Jul 13, 2022

jenting marked this pull request as ready for review July 13, 2022 08:59

jenting requested review from a team July 13, 2022 08:59

roboquat removed the do-not-merge/work-in-progress label Jul 13, 2022

github-actions bot added team: webapp Issue belongs to the WebApp team team: workspace Issue belongs to the Workspace team labels Jul 13, 2022

sagor999 reviewed Jul 13, 2022

View reviewed changes

components/ws-manager/pkg/manager/monitor.go Outdated Show resolved Hide resolved

roboquat added do-not-merge/hold size/M and removed size/S labels Jul 13, 2022

jenting marked this pull request as draft July 14, 2022 09:40

roboquat added the do-not-merge/work-in-progress label Jul 14, 2022

jenting added 3 commits July 14, 2022 13:17

[ws-manager] fix unit test compiler error

92b60a6

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>

[installer] add node get/list permission to ws-manager

b25ed48

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>

jenting force-pushed the jenting/10531 branch from 75805a4 to b25ed48 Compare July 14, 2022 13:18

jenting removed the request for review from a team July 14, 2022 13:34

jenting marked this pull request as ready for review July 14, 2022 14:22

roboquat removed the do-not-merge/work-in-progress label Jul 14, 2022

jenting requested a review from sagor999 July 14, 2022 14:29

jenting removed the team: webapp Issue belongs to the WebApp team label Jul 14, 2022

sagor999 approved these changes Jul 14, 2022

View reviewed changes

roboquat removed the do-not-merge/hold label Jul 14, 2022

roboquat merged commit 95ec04a into main Jul 14, 2022

roboquat deleted the jenting/10531 branch July 14, 2022 18:03

jenting mentioned this pull request Jul 15, 2022

Add missing permission to watch node object #11407

Merged

1 task

roboquat added deployed: workspace Workspace team change is running in production deployed Change is completely running in production labels Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call finalizeWorkspaceContent if the workspace Pod in Terminating #11337

Call finalizeWorkspaceContent if the workspace Pod in Terminating #11337

jenting commented Jul 13, 2022 •

edited

Loading

sagor999 commented Jul 13, 2022

jenting commented Jul 14, 2022 •

edited by werft-gitpod-dev-com bot

Loading

jenting commented Jul 14, 2022 •

edited by werft-gitpod-dev-com bot

Loading

sagor999 left a comment

sagor999 commented Jul 14, 2022

Call finalizeWorkspaceContent if the workspace Pod in Terminating #11337

Call finalizeWorkspaceContent if the workspace Pod in Terminating #11337

Conversation

jenting commented Jul 13, 2022 • edited Loading

Description

Related Issue(s)

How to test

Release Notes

Documentation

Werft options:

sagor999 commented Jul 13, 2022

jenting commented Jul 14, 2022 • edited by werft-gitpod-dev-com bot Loading

jenting commented Jul 14, 2022 • edited by werft-gitpod-dev-com bot Loading

sagor999 left a comment

Choose a reason for hiding this comment

sagor999 commented Jul 14, 2022

jenting commented Jul 13, 2022 •

edited

Loading

jenting commented Jul 14, 2022 •

edited by werft-gitpod-dev-com bot

Loading

jenting commented Jul 14, 2022 •

edited by werft-gitpod-dev-com bot

Loading